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PREFACE 


Introduction 

This book is about prediction and control of processes 
expressed by discrete-time models. It is assumed that the 
characteristics of the process may vary with time. The 
processes concerned may be linear or nonlinear, periodic or 
nonperiodic, single-input single-output or multi-input 
multi-output or simply output-only time series. 

In this book, the prime emphasis is on adaptive 
prediction. This is a field which is of interest to practi- 
tioners and researchers belonging to various disciplines. 
Fdr the same objectives, usually different approaches are 
used by different groups, sometimes unaware of alternative 
methods or unaware of implications as viewed by analysts of 
other disciplines. The prime aim of this book is to provide 
a unified and comprehensive coverage of the principles, 
perspectives and methods of adaptive prediction. One special 
feature of this book is the inclusion of a number of 
prediction methods, which are potent but are either new or 
are yet to be widely used. 

Control often follows predictions. Adaptive control is 
a more cohesive discipline than adaptive prediction. Again, 
within adaptive control, the predictive control classes are 
of particular interest because of their inherent robustness 
and implementability in difficult real-life situations. 
These control methods are based on predictions or specified 
predictive performances. This book presents an introductory 
exposure to the popular methods of predictive control. 

The numerical and computational aspects of the 
prediction and control methods used often influence their 
success in applications, and hence have been given due 
consideration as far as possible. 

This book is intended to be of use to students, 
researchers, practitioners, as well as to nonexperts. It can 
form a one semester course for graduate classes or may be 
selectively used for undergraduate classes. Complex 
mathematical symbols or expressions are avoided, and efforts 
have been made to ensure that lack of mathematical expertise 
does not hinder the readers’ comprehension of the subjects 
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treated. So nonspecialists should also find this book 

readable and understandable. At the same time, the rigour 

required f or the proper exposure of the subjects has not 
been compromised. 

There are 14 chapters in this book. The introductory 

and background subjects are presented in the first three 
chapters, which include discussions on process modelling, 

selection of models, and estimation of parameters. 
Chapter 4 provides an exposure to the popularly used methods 
of prediction. Chapters 5 to 11 discuss various methods of 
adaptive prediction for linear and nonlinear processes. In 
brief, the studies include input-output model based 

predictions, Kalman filter and state-space predictors, 

orthogonal transformation based predictors, and predictors 
based on hierarchical models including the Group Method of 
Data Handling and neural networks. Chapters 12 and 13 are 
devoted to predictive control; the study includes the input- 
output model based long range predictive control and the 
state-space model based method of linear quadratic control. 
Chapter 14 discusses the concept of extraction of 
information through smoothing and filtering of the data. 

The book is largely self-contained, although conven- 
tional or widely studied topics are briefly dealt with; 
wherever possible newer features have been introduced and 
related interpretations have been added. The reader is 
expected to find that at least some of the subjects 
presented in Chapters 3, 7, 10, 11, and 14 are new. 

The theoretical discussions are supported by a large 
number of examples, application studies and case studies 
selected from diverse areas. The supportive appendices 
provide background materials, implementation ideas for the 
algorithms, computer programs etc., which are designed to 
help the readers’ understanding and to ease the efforts in 
implementing the presented ideas. 

This book is also supported by a floppy disc, 

for which the author may be directly contacted. 

The subject of this book is extensive, and newer facets 
are always appearing; this book is expected to provide a 
broad introductory coverage. In spite of all efforts, there 
may be omissions, errors or imprecisions in expressions; the 
readers are requested to convey their criticisms and 
suggestions for improvement, which will be most thankfully 
received by the author. 
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CHAPTER 1 


INTRODUCTION 


Meaningful predictions, control based on predictive 
performance, and robust implementation are the main 
themes of this book. 


The objective of a prediction exercise is to determine the 
future values of a variable based on the available 
information. The more representative the information, the 
better is the chance for producing close predictions. As the 
golden rule says that there is no golden rule, it can also 
be said that the best prediction is that the prediction 
cannot be the best. This is because, in real life, bestness 
cannot be precisely defined. Predictions depend on the data 
or measurements available, the system generating the data, 
the environment influencing the measurements, the dynamic 
state of the system, and the prior subjective knowledge 
about the process etc. ; there is a possibility of implicit 
inaccuracy or imprecision with each of these. So the 
sensible objective will be to generate meaningful 
predictions, which is the prime subject of this book. 

If the process is controllable, the knowledge of the 
predictive performance can be used in the design of control 
laws f or the process in order to drive its output to the 
desired set point. If the predictions are dependable, the 
control methods have a better chance of reaching the 
targets in the expected time. Broadly speaking, the class of 
control methods, which incorporate information or 
assumptions pertaining to the f uture are ref erred to as 
predictive control, a subject which also forms a part of 
this book. 

The Prediction Problem 

The fundamental problem concerning prediction is the 
mathematical modelling of the process from the available 
information; the model is used to generate the predictions. 
The basic issues involved are as follows. 
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(a) Representativeness of the data 

The observations or measurements obtained from the process 
may be contaminated with noise, because the variable 
concerned may not be directly or precisely measurable. 

For example, consider the measurement of hardness of 
coal; one method is to collect bulk samples, which are then 
subjected to tumbling; the resulting degree of granulation 
leads to the computation of the hardness of coal. So it is 
important to appreciate that the variable of interest may 
not be the actual measurement but some underlying process 
value. 

Again consider the case of measurement of f urnace 
temperature using a thermocouple. Here an accuracy of 
measurement typically beyond the first decimal place in 
degrees centigrade is not usually expected. 

So the modelling exercise should incorporate the 
consideration of the true representativeness of the data. 
Besides the observed data, prior subjective knowledge about 
the process can greatly help formulate reliable models. 

(b) Statistical characterization of the data 

Often, to estimate the parameters of the model, the data are 
assumed to possess certain idealistic statistical proper- 
ties, which may be only loosely true, e.g., the assumption 
that the noise associated with a real-life data series being 
white Gaussian. Also, if the data length is not long enough 
(a term, which again is not well-defined, a typical figure 
being 50 to 100 data points), the statistical 

characterization of the data may not be dependable. 

(c) Modelling versus prediction 

A model, which fits the data well, may not necessarily be 
representative, and hence may not produce sensible 
predictions. In fact with increased model order, it is 
always possible to get a closer fit with the data. The model 
fit against a separate block of data, not used for 
developing the model, will be a better indication of the 
validity of the model. If the model is valid irrespective of 
the choice of the data within the complete set, it is 
expected to be able to produce close predictions. 

(d) Rate of adaptation 

Real life is not static. Hence, as time progresses, 
adaptation or modification of the parameters of the model 
will be necessary. The rate of adaptation depends on the 
degree of dynamics in the underlying process. The validity 
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of prediction requires the model to be representative and 
also to remain representative over the period of prediction. 

(e) Prediction and its validity 

Once a model is developed, the model and the available 
subjective knowledge may be used to produce predictions. 
There being no standard method of testing the validity of 
predictions, any validity test should incorporate the 
overall sense in the predictions. For example consider the 
mean square error (MSE) criterion: 

mse = i e (yi-yi) 2 . 

n i=i 

where y is the predicted value of y. The deviation of the 
prediction at one point, which may be an outlier from an 
erroneous measurement, can make the cumulative square error 

too large irrespective of the predictions being otherwise 

sensible. So although mean square error is an informative 
index, it cannot be the only decisive factor for the quality 
of prediction. In fact a plot of the predictions along with 
the observations will provide a useful insight into the 

closeness of prediction. 

(f) Implementation aspects 

Some of the important considerations from an implementation 
point of view are as follows. 

i) The numerical robustness (e.g., singular values are 
more robust them eigenvalues, numerically) 

ii) the computational stability (e.g., UD updating through 

square-root filtering of the error covariance is 
computationally more stable than the direct covariance 
matrix updating in recursive least squares 
estimations) 

iii) the real-timeliness property (e.g., low-pass filtering 
introduces phase-shift, thereby damaging real-time- 
liness). 

In practice, once the user is familiar with the complexities 
involved with modelling and prediction, necessary steps may 
be taken in the choice of the process information, the 
algorithms and in the computations such that sensible 
predictions are produced. 

The modelling and prediction problem is viewed 
differently by people belonging to different disciplines, 
who have their own preferences for suitable methods. Some 
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concepts are similar; for example, the central moving 
average used by time series analysts is similar to fixed 
interval smoothing used by systems analysts or the 
bidirectional filtering used in the field of signal 
processing. On the other hand some practices are dissimilar; 
for example, often one tries to estimate the constant or 
average term like a 0 in 

y(k) = a 0 + ajyik-l) + a 2 y(k-l) + e(k), 

where y is the output variable, a 0 , a*, a 2 are the parame- 
ters and e is the noise term; a control engineer will avoid 
such a practice, owing to identifiability problems. This is 
because, for sensible estimation, some amount of dynamism is 
necessary in the data. Ideally the data need to be 
persistently exciting. The estimation is difficult here, as 
the data associated with a 0 remain static, at unity. 

In this book, some efforts are made to compile and 
combine concepts and methods popular with different 
disciplines. 

The control problem 

Compared with prediction, the control problem is more 
clearly defined, because the objective is to drive the 
process output or a particular variable to a specified set 
point. There are two basic issues: 

(i) Identification of the process which involves the 
modelling and the parameter estimation problems. 

(ii) Control calculation which involves computation of the 
control input optimizing a predefined cost criterion. 

While proper identification of a dynamic process is 
still a difficult problem, the adaptive control problem is 
comparatively well defined, and closed form solutions are 
possible, which can also absorb imprecision in identifica- 
tion to some extent. 

In this book, a special class of control methods known 
as predictive control is studied. 

The predictive control strategy incorporates predictions of 
the controlled variable or assumptions relating to the 
future values of parameters of the controller. The basic 
objective is to enhance robustness of the controller. In 
general, robustness and speed of response are two 
contradictory demands on a controller. In practice, however, 
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there is greater emphasis on robustness, which is achieved 
through the knowledge of the predictive controller, although 
at the cost of speed of response. 

In the present text two broad categories of predictive 
control are studied; these are based on input-output or 
transfer function models and on state-space models. The 
former is referred to as the Long range predictive control, 
where the present time is used as the reference point from 
which predictions are computed, and a control law is 
determined to optimize the performance in the future. In the 
case of state-space model based predictive control (also 
known as Linear Quadratic (LQ) control, as a quadratic cost 
is minimized and a linear model of the process is 
considered), the terminal point in the future is used as the 
ref erence; the present control is determined so that the 
terminal conditions are satisfied. 

Organization of the book 

The subject matter presented in this book is arranged into 
five broad groups: 

(1) Preparatory studies: Chapters 2 and 3. 

(2) Prediction methods and applications for linear models: 

Chapters 4 to 7. 

(3) Prediction methods and applications for nonlinear 

models: Chapters 8 to 11. 

(4) Predictive control studies: Chapters 12 and 13. 

(5) Smoothing and filtering aspects: Chapter 14. 

The preparatory studies presented in Chapters 2 and 3 
overview the possible types of process models and the common 
methods for parameter estimation respectively. The models 
discussed are the transf er - f unction models based on the 
input-output data, the models developed from frequency 
domain characterizations, and the structural models 
comprising the trend, the periodic component(s) and the 
random component. In real life most processes are better 
represented by stochastic models incorporating additional 
random components; so knowledge of stochastic processes, 
also featured in Chapter 2, is a prerequisite of the 
modelling and estimation studies. 

The parameter estimation studies presented in Chapter 3 
mainly concentrate on the method of the least squares. 
Particular emphasis has been laid on robust implementation 
through singular value decomposition. The problems of 
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selection of the optimal model, and the assessment of the 
validity of the model are also addressed. 

Chapter 4 presents some of the well studied methods of 
prediction which are popular with time series analysts and 
statisticians. The topics discussed include the exponential 
smoothing based methods and the Box and Jenkins methods. 

Chapter 5 presents the input-output model based 
predictors. The concept of constraining the prediction 
increments is introduced, which incorporates prior 
subjective knowledge about the underlying process into the 
predictor, and is expected to lead to robust prediction. 

The state-space modelling ideas are introduced in 
Chapter 6. The main attraction of a state-space approach is 
that the measured process variables as well as internal 
process variables which cannot be accessed can be 
incorporated in the model; a vast amount of thoroughly 
researched results are also available to the designer. A 
detailed study on the Kalman filter, the optimal linear 
filter, is included. The optimal state-space predictor is 
derived from the Kalman filter. 

Chapter 7 introduces the concepts of orthogonal 
transf ormation and studies the two particular types of 
transformations: the singular value decomposition (SVD) and 
the Walsh Hadamard transformation (WHT); while the former is 
exceptionally robust numerically, the latter is extremely 
simple from a computational point of view. SVD which has so 
far rarely been used in prediction applications, features 
again and again in the present text. For example SVD is used 
as an algebraic tool for matrix operations in Chapter 3, for 
the modelling and prediction of periodic series, 
quasiperiodic series and nonlinear input-output processes in 
Chapters 8 to 11, and used for filtering in Chapter 14. The 
application of SVD and WHT for modelling and prediction of 
nearly periodic time series is explored in Chapter 7. 

Most real life processes are nonlinear to varying 
extents. Chapter 8 starts with a discussion on the basic 
features characterizing nonlinearity of a process, which is 
f ollowed by assessment of nonlinear periodicity through 
state-space diagrams. A review of some of the popular 
methods of modelling and prediction of nonlinear time series 
is also presented. 

Chapters 9 and 10 concern hierarchical models f or 
nonlinear time series or input-output processes. The Group 
Method of Data Handling (GMDH) presented in Chapter 9 and 
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the neural network models discussed in Chapter 10 are both 
powerful methods. In both the cases, multiple stages of 
simple nonlinearity are used to develop complex nonlinear 
models. It has been shown that in both the cases application 
of SVD and QR with column pivoting (QRcp) factorization can 
lead to parsimonious designs with improved performance. The 
neural-net models are comparatively more versatile and 
powerful. The parameters of the network may be estimated 
using a nonlinear optimization method. Applications with the 
network operating both with time-domain data and 

(orthogonally) transformed data are presented. 

Chapter 11 is devoted to the modelling and prediction 
of quasiperiodic series using SVD. Two basic approaches are 
explored. In the first approach a quasiperiodic series is 
decomposed into relatively periodic component series (having 
the same period length) in hierarchical levels of 

nonlinearly transformed spaces; the periodic components are 
individually modelled linearly or modelled using neural 
networks. In the other approach, the series is decomposed 
into multiple periodic components (having diff erent period 
lengths) in the time domain; each periodic component is 
expressed by a linear model. Both the approaches can produce 
prediction of one complete period. 

The information conveyed through prediction is often 
used f or control either directly or indirectly. If the 
process concerned can be controlled, predictive information 
or assumptions can be directly used in the design of 

predictive controllers. Two broad categories of predictive 
controllers are considered, namely, the input-output model 
based controllers and the state-space model based 
controllers, which are discussed in Chapters 12 and 13 
respectively. The predictive inf ormation is incorporated in 
these two approaches differently. In the input -output model 
approach, 1 to multistep prediction is used with the 

present time as the point of ref erence; in the case of 
state-space approach, a terminal point in future is used as 
the reference. 

Chapter 14 is devoted to the smoothing and filtering 
studies which are particularly important for prediction and 
control problems. The property of the real-timeliness of the 
data has been given due consideration. The problems 
addressed include low-pass filtering without phase-shifts, 
extraction of signals from noise corrupted data, and 
estimation of pattern for nearly repetitive processes. 
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Lastly a number of Appendices are included which 
support the subjects covered in the various Chapters; it is 
hoped that the Appendices will aid the comprehension of the 
readers as well as assist implementational efforts. 

Special Features 

In the widely studied field of adaptive prediction and 
control, this book presents potential concepts, and reviews 
newly developed methods for the modelling and prediction of 
nearly periodic and quasiperiodic series and complex 
input-output processes as well as numerically robust and 
computationally efficient implementation ideas for almost 
all the presented methods of prediction and control. In 
addition, efforts are made to bring together the concepts 
and practices popular with diverse disciplines. The studies 
presented incorporate due consideration of real lif e 
problems, which have been explored using illustrative 
examples and case studies. 
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CHAPTER 2 


PROCESS MODELS 


Modelling concerns the mathematical representation of 
the nature of the process with respect to its 
environment; the purpose of modelling and the type of 
data available are important considerations. 


2.1 INTRODUCTION 

The understanding and study of any process, requires a 
mathematical representation or model of the process. The 
process may be an input-output process, or a time series 
(i.e. an apparent output only process). The model is based 
on the prior physical or subjective knowledge about the 
process, the measured data on the inputs and the outputs of 
the process, and the physical and engineering laws governing 
the working of the process. The primary requirements of a 
model are (i) representativeness and (ii) long-term 
validity. 

If the model is a complete and exact representation of 
the process, it is called a deterministic model, and the 
process is called a deterministic process. The parameters of 
such a model are precisely known, and the model can be used 
to produce exact prediction of the process response from the 
past data. However, most real-lif e processes cannot be 
represented by deterministic models, because of the dynamic 
nature of the process and the noise (meaning lack of 
information) and other uncertainties being associated with 
the available data; so, the description of the process by 
the model can only be probabilistically close to the actual 
process. A model which incorporates noise or disturbance 
terms to account for such imprecision in the knowledge of 
the process is called a stochastic model. 

Modelling involves selection of the process variables 
to be considered, selection of the class of model, selection 
of the model structure, estimation of the parameters of the 
model and testing of the validity of the model. There has to 
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be economisation in the degree of complexity . of the model, 
as otherwise the overall validity of the model tends to 
suffer; it is desirable that the model is as simple as 
practicable. If the characteristics of the process change 
with time, the parameters of the model are to be estimated 
recursively; such a model is referred to as an adaptive 
model. 

Throughout this book, models of diff erent categories 
have been used for various types of applications, an 
introductory summary of which is presented in Sec. 2. 2. This 
is followed by a study on stochastic processes in Sec. 2. 3; 
knowledge of stochastic processes is important, because most 
real-life processes are stochastic in nature. Next, two 
important classes of models are discussed. The transfer 
function models are studied in Sec. 2. 4, and the models based 
on frequency domain analysis are described in Sec. 2. 5, where 
data sampling aspects of measurements are also discussed. 
The structural properties of processes like the trend and 
the seasonality etc. can be used for configuring a model, as 
discussed in the context of structural modelling in Sec. 2. 6. 
Structural modelling through periodic decomposition has also 
been introduced. 


2.2 PROCESS MODELS AND THEIR CHOICE 

For representative modelling, the choice of the model, the 
estimation of parameters and the testing for the validity of 
the model are all equally important. Parameter estimation 
and model validation aspects are discussed in Chapter 3; 
this section discusses the candidate classes of models and 
the considerations leading to the choice of the models. 


2.2.1 Classes of Models 

Five broad classes of discrete-time models are discussed; 
these are 

(1) time series and transfer-function models, 

(2) models based on trigonometric functions, 

(3) state-space models, 

(4) models based on orthogonal transformations, 

(5) hierarchical models including GMDH and neural networks. 
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Months 

Figure 2.2.1 Monthly rainf all pattern in India over 
the period 1940 to 1949 (Appendix 7F). 

A common feature of these models is that they can all 

accommodate a certain degree of uncertainty, and can adapt 
to time-varying process dynamics. The modelling of some 
processes may require incorporating features of more than 
one class. An outline of the stated classes of models 
follows, and is summarized in Table 2.2.1. 

Time series and transfer-function models 

Time series is a sequence of observations on a variable of 
the process. It may or may not have a periodic component 
associated with it; typical examples are the monthly 

rainfall pattern over the years (Fig.2.2.1), and the 
variations in the rate of rotation of the earth (Fig. 2. 2. 2). 

Time series may be represented by AR, IAR, ARMA, ARIMA 
etc. type models, which are based on polynomial operators in 
discrete time; these models are discussed in Sec. 2. 4. If a 
time series shows structural features (like trend and 
periodicity), the structural components may be separately 
modelled. Structural modelling features in Sec. 2. 6. 

Remarks 

(a) Here as well as elsewhere in this book, a time series 
has often been referred to as a ‘process’, (b) The reference 
to ‘time’ in time series is not a limitation. The studies on 
time series also apply to sequences in space. 
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Figure 2.2.2 The yearly variations in the rotation 
rate of the earth in units of 10 seconds (Appendix 8B). 

Transfer-function models are natural extensions of time 
series models. It is expected that the process in question 
is subjected to certain external inputs, which influence the 
output of the process; for example, the temperature (output) 
of a furnace varies with the change in the fuel-gas flow 
(input) into the furnace. Transfer-function models have 
additional terms for the exogeneous input(s), as in the 
ARMAX (that is ARMA with eXogeneous input(s)), and ARIMAX 
models; these models are discussed in Sec. 2. 4. 

The transfer-function models have been widely used in 
this book. 

Models based on trigonometric functions 

Processes with regular or irregular periodicity can be 
analysed in frequency domain and can be modelled in terms of 
components, expressed as trigonometric functions. Besides 
modelling, frequency domain characterization provides useful 
information in the design of filters as well as in assessing 
the appropriate rate of sampling of continuous-time signals 
for discrete time modelling. A detailed study of frequency 
domain analysis and modelling based on trigonometric 
functions is in Sec.2.5. 

State-space models 

State-space models have the unique f eature that along with 
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variables which are known or can be measured, the variables 
which are internal to the process and cannot be measured are 
also incorporated into the model; this is why a state-space 
model is also called an internal model, whereas a model 
based on measurable variables is called an external model. 
For example, consider the problem of modelling the internal 
temperature variations of steel ingots (discussed in 
Sec. 6. 8) while being heated inside a soaking pit (or 

furnace) before rolling. Here only the temperature inside 
the f urnace is measurable, whereas the ingot surf ace 
temperature and the ingot-centre temperature are not 

measurable; all the three variables are considered as state 

variables in the state-space model of the f urnace. The 
f ue 1 -gas flow to the furnace is regarded as the exogeneous 
input in the model. 

Any transfer-function or time series model can have a 
state-space representation but the converse is not true. 
Processes with or without periodicity can be modelled by 
state-space models. Chapter 6 is devoted to the study of 
state-space models and their applications. State-space 
formulation of LQ control features in Chapter 13. The 

state-space models f or optimal smoothing is presented in 
Sec. 14. 2. 


Models based on orthogonal transformations 

In this book there is particular emphasis on the use of 
singular value decomposition (SVD) for modelling. The 
special feature of SVD is that it results in optimal 
compaction of information (as discussed in Sec.7.6). 

The models based on SVD are particularly suitable for 
time series which are nearly periodic (for example see 
Fig.2.2.1), or quasiperiodic in nature (for example the 
yearly averaged sunspot series as shown in Fig.2.2.3). 

The principle of modelling f or the nearly periodic 
series is that the consecutive periods are aligned into 
consecutive rows of a matrix, which is SV-decomposed; the 
decomposed components are now modelled, typically as a time 
series. This subject is treated in detail in Secs.7.7-7.8. 

A quasiperiodic series, can be decomposed into compo- 
nents which are individually nearly periodic, and hence can 
be modelled the same way as above. Such a modelling scheme 
features in Sec. 11. 4. 

The two attractive features of SVD based modelling are 
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Years 

Figure 2.2.3 The series of yearly averaged sunspot 
numbers (Appendix 8A). 

that (i) one or multiple period ahead prediction may be 
produced, and (ii) SVD, which is extremely robust numeri- 
cally, ascribes robustness to the model. 

Hierarchical or multilayer models 

These models are primarily suitable f or time series and 
input-output processes with nonlinearity; quasiperiodic 
processes can also be modelled. The three types of models 
studied are: 

(1) models based on Group Method of Data Handling (GMDH), 

(2) neural network models, and 

(3) models based on singular value decomposition with or 
without nonlinear transf ormation. 

All these models have hierarchical stages or layers, where 
each stage incorporates simple elements of nonlinearity. 
Since most processes contain a certain degree of nonlinear- 
ity, these models are applicable for nonlinear as well as 
nearly linear (and nearly periodic) processes. 

Typical processes that can be modelled are multi-input 
single output processes like the economic inflation process, 
quasiperiodic processes like the yearly averaged sunspot 
series etc. Nearly periodic processes like the homogeneous 
monthly rainf all series may also be modelled using such 
hierarchical models. Chapters 9, 10 and 11 are devoted to 
the study of hierarchical models. 
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Table 2.2.1 Summary of the main classes of models studied 


Model type 

Type of process 
model 1 ed 

Features and 
a ppl icat i ons 

Refe- 

rence 

State-space 

Processes with or 

Internal models; 

Ch.6 

mode 1 s 

without periodicity. 

variables modelled 


(based on state 

linear or nonlinear 

may or may not be 


variables , input 

processe s , 

measurab 1 e . 


and output vari- 

output only or Input 



ables and noise) 

-output processes. 

p-step ahead pre- 

Ch.6, 



diction, LQ-control 

Ch. 13, 



smoothing and 
f i ltering . 

Ch. 14 

Orthogona I 

Periodic processes 

Periodic modelling. 

Ch.7 

transf ormat i on 

modelled using SVD. 

one period or one 


(Singular value 


p seudo-pe r 1 od 


decompos i t i on ) 

Quasi per 1 od i c proce- 

a h ead pr e d i c 1 1 on , 

Ch. 11 

based mode 1 s 

sses modelled using 

pattern decomposi- 

Ch. 11, 


a) SVD and nonlinear 

t i on and p a 1 1 ern 

Ch. 14 


transformation with 

e x tract! on , 



1 inear mode 1 1 ing , 

smoothing and 



b) SVD and nonlinear 
transformation with 

f i ltering . 

Ch. 14 


nonlinear modelling, 
c) SVD and multiple 
pattern decompos 1- 
t ion with 11 near 
mode 1 ling. 



Time series/ 

data sequence or 

p-step ahead pre- 

Ch.4,5 

transfer- 

output only process/ 

diction, constra- 


function models 

Input-output process 

i ned pred 1 ct i on. 

Ch. 5 


with or without 

predictive control 

Ch. 12 


period i city. 

of input- output 
p r ocesses . 


Frequency domain 

Proce sse s with 

S i gnal an a 1 ys Is, 

Ch .2,6 

analysis based 

period i c 1 ty . 

modelling and 


model s 


p redict i o n , 


. . 


f i ltering . 


Multi-layer 

Periodic and quasi- 

p-step ahead 

Ch. 9 

models (GMDH and 

periodic processes 

p redict i o n , 

Ch. 10 

Neural Network 

with nonlinearity. 

one period ahead 

Ch. 1 1 

mode 1 s ) 

i nput -ou tpu t 
processe s . 

p redictl on . 



Remarks 
(a) In all 
parsimonious 
variables are 


the cases, efforts are made to develop 
models, i.e. only the essential number of 
to be included in the model, and the model 
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order is kept as low as possible. 

(b) The degree of accuracy of the data should be duly 
considered. In case of real-life applications, the model 
needs to be protected against irrelevant inf ormation 
influencing the model. 

(c) The underlying assumptions in building the model as 
well as the limitations of the model should be clearly 
stated and verified. 


2.2.2 Choice of models 

Three basic attributes of a process which influence the 
choice of the type of model are linearity, periodicity, and 
stationarity. 

Linearity: No process is exactly linear, but acceptable 
solutions may be obtained by considering the process 
piece-wise linear, or locally linear (that is linear around 
the operating point). In case of nonlinearity, efforts may 
be made to linearize the representation of the process 
through nonlinear transf ormation. If linearization is 
not possible, or if it is thought that a nonlinear model 

will be more representative, a nonlinear model is used. 
The preference for a linear model is because (a) it is 

computationally simpler, (b) the desired statistical 
features for nonlinear models are not fully established. 

Periodicity: Some processes show periodic variations. A 
periodic process is characterized by three components: the 
length of the period, the pattern which repeats over the 
periods, and the relative magnitude of the patterns. When 
the length of the period as well as the pattern remains 
almost unchanged a linear model may be produced. If both or 
either of the period length and the pattern vary, the 
process is called a quasiperiodic process, and the process 

can be modelled as a nonlinear periodic process or as a 

combination of periodic processes (see Sec. 8. 3 and Chapter 

11 ). 

Stationarity: This is the property of the process charac- 
teristics remaining unchanged with time (or space). If the 
process is not stationary, its degree of nonstationarity may 
be decreased by suitable transformations (e.g., logarithmic 
transformation). A relatively less stationary process will 



2.3 Stochastic Processes 17 


require more frequent adaptation. The characteristics of 
stationary processes are discussed in Sec. 2. 3.1. 

An exhaustive study of the possible classes of models and 
the criteria for their choice is beyond the scope of this 

book. The main considerations are: 

(a) whether a periodic or quasiperiodic model is desired, or 
whether the model is to be nonperiodic in nature, 

(b) whether a linear or a nonlinear model is to be designed, 

(c) whether one or more variables of interest are 
inaccessible or unmeasurable, etc. 

(d) whether the model is to be deterministic or stochastic 
in nature. 

Remarks 

(a) The modelling exercise is greatly simplified if the 

process is linear or periodic and stationary. In real life, 

most processes are not so, and efforts are made to 

preprocess the data to increase the degree of stationarity, 
periodicity or linearity prior to modelling. The prepro- 
cessing can involve differencing of the data, nonlinear 

transformations (Sec. 8. 2. 3), smoothing or filtering 

(discussed in Chapter 14), orthogonal transformation 

(discussed in Chapter 7) or other linear transformations. 

(b) Between different contending classes of models, the 
choice will depend on the purpose of modelling, the quality 
of data available, and the computational preferences of the 
user. 


2.3 STOCHASTIC PROCESSES 

The word stochastic is of Greek origin, meaning to guess. A 
stochastic process is a random process evolving in time 
whose behaviour can be analysed statistically but cannot be 
predicted precisely. 

The uncertainty or the unknown disturbance in the 
observed process is described by a stochastic process and 
because of the presence of this uncertainty, the overall 
process is also called a stochastic process. 

The more accurate the statistical representation of the 
uncertainty in the process model, the higher the probability 
is of the model response being close to the actual response 
of the process. Hence it is necessary to study the 
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stochastic nature of the disturbance processes. 


2.3.1 Basic Concepts and Processes 

Some of the basic concepts and some of the basic types of 
processes are discussed here. 

Stochastic processes 

A stochastic process X(t) belongs to the family of random 
variables (x(t), teT), where the random variables are 

indexed with the parameter t, all of whose values lie in am 
index set T. The parameter t is usually interpreted as time 
(although this is not a limitation). The random variables 
may be scalars or vectors. 

With the notion of time as the index, if the time index 
set is defined as T = (t: 0<t<«> or T = <t: t&t 0 ), the 

process is called a continuous-time stochastic process. On 

the other hand, if T is a set of discrete time instants, 

T = (k+i: i = ..., -1, 0, 1,...}, the process x(k) is called 
the discrete time stochatic process. The sampling period is 
chosen as time units which are distinct but not necessarily 
equispaced. 

Stationary processes 

A stochastic process x(k) is said to be strict-sense 

stationary, if its probability density functions are inde- 
pendent of the shift in the time origin, that is if the two 
processes x(k) and x(k+r) have the same probability density 
function for any x, they constitute a strict-sense 

stationary process. 

The processes x(k) and y(k) are jointly stationary, if 
the joint statistics of x(k), y(k) are time-invariant, i.e. 
the joint statistics are the same as that of x(k+x), y(k+x), 
for any x. 

The process x(k) is said to be wide-sense (or weak- 
sense) stationary, if its mean value is constant and its 
autocorrelation R depends only upon the time difference, 
expressed as 

E(x(k)> = x, a constant, (2.3.1a) 

E(x(k+x)x(k)> = K(x). (2.3.1b) 
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Wide-sense stationarity does not require probability density 
function to be invariant unlike the strict-sense stationary 
process. 

The stationary processes for which time averages equal 
probabilistic averages, (i.e. ensemble average or expecta- 
tion) are called ergodic. Not all stationary processes are 
ergodic. 

If the discrete time process (x(k), k = 1,2,...,N) is 
ergodic, it can be completely described by its expectation 

x = E(x(k)) = lim i £ x(k), 

N — >00 N 

and by its autocorrelation function 

R(x) = E{x(k)x(k+r)> = lim - T x(k)x(k+r). 

n — n k t , 1 

Remark : A covariance stationary process (x(k)> may be 
decomposed as x(k) = x d (k) + x n (k), where x d is a purely 
deterministic component, and x„ is a purely stochastic 
component; such a decomposition is known as Wold's 
decomposition (Wold, 1954). 

Markov processes 

The Markov property states that the f uture depends on the 
knowledge of the present, and not on the knowledge of the 
past. A stochastic process (x(k), keT) is called a Markov 
process, if 

P(x(k+1) |x(k),x(k-l) x(k„)> = P(x(k+l)|x(k)>; (2.3.2) 

in other words, the conditional (or transitional) 
probability density function for x(k+l) depends only on its 
present value x(k) and not on any value in the past; here 
the time instants k 0 <...<(k-l)<k<(k+l) belong to the set T. 
Thus the concept of probabilistic causality (see Appendix 
12) is inherent with the Markov process. 

The joint probability density function of the Markov 

sequence (scalar or vector x(k+l), x(k), x(k-l) x(k Q )) 

is completely specified in terms of the initial probability 
density function P(x(k„)) and the transition density 
function P(x(k+1) |x(k)>. Following (2.3.2), the joint 
probability density function is given by 
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P(x(k+l),x(k),x(k-l),..,x(k 0 )> 

= P(x(k+1) |x(k))P(x(k) |x(k-l)}...P{x(k 0 +l) |x(k 0 ))P(x(k 0 )}. 

The Markov process as defined above is sometimes called the 
first order Markov sequence. The underlying concept may be 
extended and a second order Markov process may be defined as 
the one requiring the two most recent information elements 
to describe the future, i.e. with (k+l)> k >...>k 0 , 

P<x(k+l)|x(k),x(k-l) x(k 0 )> = P(x(k+l)|x(k),x(k-l)>. 

(2.3.3) 

Higher order Markov processes may also be defined the same 
way. 

Remarks 

(a) A subset of a Markov sequence is also a Markov sequence. 

(b) If a Markov sequence is reversed in time, it still 
retains Markov property: 

P(x(k+l)|x(k+2)...x(k+n)> = P(x(k+1) | x(k+2)>, 

where (k+l)<(k+2X...<(k+n); all belong to the set T. 

Gaussian distribution 

A random variable x is called a Gaussian or normally 

distributed random_ scalar variable with mean{x> = x, and 
covariance, covi(x-x)} = <r , if its probability density 
function is given by 

P(x) - — — exp (- ( *~ X) 1 . (2.3.4) 

V2 n<r ' 2 <r 2 ' 

The expression ‘x is N(x,(r )’ means, x is Gaussian (normal) 
with mean x and variance <r . 

A random vector is called a Gaussian or normally 

distributed random n-vector x = [xj, x 2 x„r , if its 

probability density function is given by 

P(x) = — — — exp[~(x-x) T P _1 (x-x)], (2.3.5) 

(2n) n/Z (detP) 1/Z 2 

where the mean vector x and the covariance matrix P are 
given by 

x = £{x>, P * £((x-x)(x-x) T ). 
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Thus the mean and the covariance uniquely describe the 
Gaussian probability density function. 

A stochastic process (x(k), keT) is called a Gaussian 

or normal process, if for any n points: k lf k 2 k n in T, 

the random variables, 

x(k x ) x(k n ), (2.3.6) 

are jointly of Gaussian or normal distribution. The proba- 
bility law f or the joint density f unction of the Gaussian 
stochastic process (2.3.6) is completely specified by the 
two positive parameters, the mean and the autocorrelation 
function, given by 

E(x(k)) = x(k), for all keT, 

E(x(k 1 )x(k 2 )> = P(k lf k 2 ), for all (k^k^eT. 

Remarks 

(1) If P, the covariance matrix of x in (2.3.5), is 
singular, the Gaussian property of x cannot be defined by 
the probability density function; the characteristic 
function can be used in that case. 

(2) A stochastic process is called a Gauss-Markov sequence, 
if it has Gaussian distribution and at the same time is a 
Markov sequence. 

(3) (a) A subset of a Gaussian vector is also Gaussian. 

(b) Gaussian variables retain their Gaussian character 
under linear transformation. 

White noise processes 

A random sequence <x(k n ), x^n^),..., x(k 2 ), x(k x )> is said 
to be a purely random or white noise sequence, if x(k t ) and 
x(kj) are completely independent for i * j. For such 
processes, the conditional density is the same as the 
marginal density. 

P(x(k n )|x(k n _ 1 )) = P(x(k n )>. 

The implication is that (a) the white noise sequences do not 
possess any memory and (b) the present is independent of the 
past, while the future is independent of the present. 

The autocorrelation function is given by 

E<x(k!)x(kj)> = PSj j, 

where 5 t j is the Kronecker delta function: 
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. fl for i ■ j, 

~ (0 for i * j. 

If the white noise process has a Gaussian distribution, it 
is called a white Gaussian process, which is completely 
specified by the mean and the covariance matrix. One widely 
studied white Gaussian noise process is the Brownian motion 
or Wiener process. Usually the white noise process is zero 
mean, and if stationary, its power spectrum will be 
constant. 

Remark: Although in signal processing, the term noise 
implies ‘lack of information or signal’, white noise is an 
important entity. Because of its unique statistical 
properties, white noise is often used as the input signal for 
process identification. 


2.3.2 Examples of Common Processes 

Some of the commonly occurring stochastic processes are 
Brownian motion or Wiener process, Poisson process and 
Random walk. All these processes are characterized by being 

(a) Markov processes and 

(b) independent increment processes, 

where an independent increment process <x(t)> has the 
property that for t 1 <t 2 <...<t n , the differences, 

(xU^-xCt})), (x(t 3 )-x(t 2 ) ),..., (x(t n )-x(t n _ 1 )), 

are mutually independent. 

Brownian motion (Wiener process) 

A microscopic particle in a fluid moves in an erratic 
fashion due to random collisions with other particles and 
frictional resistance between collisions. This phenomenon is 
called Brownian motion after Botanist Robert Brown, who 
observed it in 1826. Brownian motion is also called the 
Wiener process, as this process was rigorously analysed by 
Nobert Wiener. 

Any specified time interval r to s is expected to be 
much larger than the time between two collisions and hence 
the movement of the Brownian particle is the resultant of a 
larger number of random movements. 
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y(k) 



Figure 2.3.1 A typical Brownian motion series 

The Wiener process is a stochastic process with 
stationary increments, (x(s)-x(r)}, which are normally 
distributed with zero mean and variance proportional to the 
time difference (s-r): 

£<(x(s)-x(r))> = 0, 

£<(x(s)-x(r)) 2 } = <r 2 |s-r|, 

2 

where <r is a positive constant. Thus the probability 
density of displacement from time r to s is the same as 
from time (r+x) to (s+x), since the density depends upon 
the length of the time interval and not on specific time 
reference. 

Fig. 2.3.1 shows a Brownian motion series <y(k)> 
generated by passing a white Gaussian noise sequence ix(k)} 
through an integrator, implemented as x(k)/(l-q _1 ) (where 
q is a unit discrete time backward shift operator, e.g., 
q y(k) = y(k-l)): 

y(k) = y(k-l) + x(k). 

Poisson process 

Poisson process x(t) is an integer valued stochastic 
process, constituting of (mostly nondecreasing) jumps of 
unit magnitude, occurring at random time intervals. 
(x(t 1 )-x(t 2 )) equals the number of events that occurred in 
the time interval (t 2 , t t ). The probability of m 
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/ 

x(t) 



0 t 


Figure 2.3.2 A Poisson process. 

(= x(t 1 )-x(t 2 )) events in the time interval of length t 
(= t l -t 2 ) is given by 

P{m,T> = e' XT . (2.3.7) 

m! 

where At is the mean: EixttjJ-xtt^} = A(t!-t 2 ) = At. 

For example, the number of telephone calls at a switch 
board is a Poisson process with the intervals between 
successive calls being independent increments and having a 
distribution given by (2.3.7). Fig. 2.3.2 shows a typical 
Poisson process. 

Random walk 

Random walk is a discrete time process with magnitude 
randomly jumping by +1 or -1 (or by +L, -L, where L is a 
constant) at each periodic instant. 

The position, x(t), at the n-th period 

x(nT) = Xj + x 2 + ... + x n , 

where T = time interval, and (Xj, i = l,...n)> is a family 
of independent and identically distributed random variables 
assuming values +1 or -1. Thus the position at any discrete 
time is obtained by integrating the random variables. Hence 

x(nT) - x(nT-T) = x n , 
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x(t)” 

s 

jnjJ ~ 1 ~ L i n H h . 


o 

r 


Figure 2.3.3 A typical random walk sequence. 

where q 1 is a discrete time backward shift operator: 
q" y(t) = y(t-T). It can be shown that 

£{x(i)> = 0, 

and 

E(x z (nT)> = nL 2 . 

Fig. 2. 3. 3 shows an example of a random walk sequence. 


2.4 TRANSFER-FUNCTION MODELS 

Mainly three types of variables constitute a transfer- 
function model: y, the measured output of the process, u, 
the known (or measured) input to the process, and e, the 
noise or uncertainty in the model. 


2.4.1 Some Basic Models 

Autoregressive (AR) model 

These models can be expressed as 

y(k) + a 1 y(k-l) + ... + any(k-n) = e(k), (2.4.1a) 

where a! a„ are the model parameters; y(k) refers to 

the measurement of the output at time k. y(k-n), y(k-n+l), 
..., y(k-l) are measurements of the output at successive 







26 Chapter 2 Process Models 


time instants in the past; for example in the case of 
monthly data, these are data for successive months, e is 
referred to as the noise or the disturbance; it accounts for 
the errors in the measurements, the unaccounted for 
disturbances acting on the process and the modelling error. 

The model (2.4.1a) can also be expressed as 

A(q _1 )y(k) = e(k), (2.4.1b) 

where 

A(q -1 ) = 1 + ajq’ 1 + ... + a,^ - ", 
q -1 being a unit discrete time backward shift operator. 

Integrated autoregressive (IAR) model 

This model is similar to the AR model except f or the 
integrated noise structure 

A(q _1 )y(k) = A = l-q"\ (2.4.2a) 

So 

A(q _1 )Ay(k) = e(k), 
that is 

Ay(k) + a 1 Ay(k-l) + ... + anAy(k-n) = e(k), (2.4.2b) 

where Ay(k) = y(k) - y(k-l). 

Example 

The yearly variation of the earth’s rotation rate (y(k)> 
shown in Fig. 2. 2. 2 can be modelled as 

Ay(k) = 1.0197Ay(k-l) - 0.6746Ay(k-3) + 0.508Ay(k-4) 

-0.2361Ay(k-6) + 0.1981Ay(k-8) - 0.1487Ay(k-10) + e(k). 

The model is parameterized using the method discussed in 
Secs.3.3.2-3.3.3. The data are given in Appendix 8B. 

Autoregressive moving average (ARMA) models 

Here the noise is represented by an extended sequence: 

A(q _1 )y(k) = C(q -1 )e(k), (2.4.3a) 

where 

C(q _1 ) = 1 + c x q -1 + ... + c r q _1 . 
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Thus (2.4.3a) is given by 

y(k) + ajyfk-l) + ... + a^k-n) 

= e(k) + c^fk-l) + ... + c r e(k-r). (2.4.3b) 

Here, for estimation of the parameters, it will be necessary 

to know e(k-l), e(k-2) e(k-r) for which estimated 

values may be used. A typical approach for the estimation of 
the noise follows. 

Following (2.4.3a), 

y(k) = (1-A(q _1 ))y(k) + C(q -1 )e(k). 

So, the estimate y(k | k— 1) can be expressed as 

y(kjk-l) = - a^lk-l) - ... - anytk-n-l) 

+ c^fk-l) + ... + c r e(k-r), (2.4.4) 

where a lf c t etc. are estimated^ parameters. The estimate of 
e(k) is given by e(k) = y(k)-y(k|k-l). Initially the values 
of e may be assumed to be zero. 

Autoregressive integrated moving average (ARIMA) model 

Similar to (2.4.2a), here an integrated noise structure is 
considered: 

A(q _1 )y(k) = C( fL )e - ( K l, (2.4.5) 

That is 

Ay(k) + a 1 Ay(k-l) + ... + a„Ay(k-n) 

- e(k) + c^k-l) + ... + c r e(k-r). 

Autoregressive moving average model with exogeneous 
input (ARMAX) 

This model is similar to ARMA, with additional input 
variable(s) incorporated: 

A(q -1 )y(k) = B(q -1 )u(k-d) + C(q _1 )e(k), (2.4.6) 

where 

B(q -1 ) = b 0 + bjq 1 + ... + b B q" m ; 

d is the time delay between the input u and the output y, 
that is a change in u results in a change in the output y 
after d time-steps. ARMAX and CARMA or Controlled ARMA 
models are of the same category. 
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Autoregressive integrated moving average model with 
exogeneous input (ARIMAX) 

This model is similar to ARIMA model with additional 
exogeneous input variable! s) incorporated: 

A(q _1 )y(k) = B(q -1 )u(k-d) + C - q - ^ - (k - ; (2.4.7) 

The CARIMA (i.e. Controlled ARIMA) model has the same 
structure; the use of CARIMA models in process control 
features in Chapters 12 and 13. 

Remarks 

(1) The models discussed here are algebraically similar to 
the regression model 

y(k) = a„ + a 1 x 1 (k) + ... + a^Ik) + e(k). 

Here, each regressor vector is a time series by itself. 

(2) The estimation of autoregressive parameters can be more 
difficult than the moving average parameters due to the 
inherent coupling between the output y(k) and the 
corresponding time delayed variables y(k-l), y(k-2)- etc. 


2.4.2 Model Structures 

The representative estimation of the parameters and the 
validity of the model are the main considerations which 
influence the choice of the structure of the model. It is 
desirable that the estimates are true, yet the estimation 
procedure is simple; f or this, one of the common require- 
ments is that the noise should be uncorrelated with the 
data. Again, the long term validity of the model requires 
the stationarity of the data. 

Preprocessing of data 

Appropriate preprocessing of the data can help the 
estimation procedure and ensure validity of the estimates. 
The most common approach for preprocessing is differencing. 
The differencing for (y(k)> can be as follows: 
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(a) unit time-step differencing, e.g., z(k) = y(k)-y(k-l), 

(b) successive differencing, e.g., x(k) = z(k)-z(k-i), tel. 

(c) periodic differencing, e.g., z(k) = y(k)-y(k-p), where 
p is the period length, etc. 

Example 

The German unemployment series <y(k) > (Appendix 7E) shown in 
Fig. 14.3.3 is modelled as 

w(k) = -0.1164w(k-2) - 0.3039w(k-ll) - 0.4062w(k-12) + e(k), 
where w(k) = z(k) - z(k-l), and z(k) = y(k) - y(k-12). aQ 

Again, preprocessing of the data in terms of nonlinear 
transformation (discussed in Sec. 8.2.3) can improve 
stationarity. Nonlinear transf ormation is also used to 
formulate linear-in-the-parameter models for nonlinear time 
series or processes. 

Estimation and the noise process 

The proper characterization of the noise is important from 
a parameter estimation point of view. The assumed noise 
structure has to conform to the requirements of the 
estimation method, as otherwise the estimates may not be 
valid. A convenient and hence popular assumption is that the 
noise process is white. In such cases the least squares 
method (Sec.3.3), which is one of the simplest methods of 
parameter estimation, can be used, as it produces optimal 
estimates. 

In real life, the noise is rarely white with Gaussian 
distribution. Therefore the model is configured so that the 
noise has a convenient structure. The integrated noise 
structures considered in the case of IAR, ARIMA and ARIMAX 
models are produced with such an objective. The differencing 
of the data implicit with these models tends to result in 
the noise being uncorrelated with the data. 

Consider the characteristics of the data from frequency 
content point of view. For every process, the frequency 
components present in the data are expected to lie within a 
certain range. Very low frequency components are responsible 
for the mean or zero frequency component (also referred to 
as the DC or average component, in analogy to electrical 
currents) and the trend component present in the data. Since 
the variations in the magnitude of these components are 
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small, estimation of the corresponding parameters is 
difficult and hence should be avoided. Therefore, it is 
desirable to configure the model such that the low frequency 
components are not present; this can be implemented by 
high-pass filtering (Appendix 14A) of the data before 
estimation; simple time-differencing is a basic form of 
high-pass filtering. 

In this connection, there are two important issues to 
be considered: (i) the noise associated with the data often 
contains high frequency components (typical examples are 
spikes, sudden rises or falls in the level, outliers etc.), 
and (ii) high-pass filtering tends to accentuate high 
frequency noise. The effect of such noise can be eliminated 
by low-pass filtering (Appendix 14A) of the data. 

Thus modelling from the measured data has to satisfy 
three main requirements: 

(i) the noise or the component of uncertainty in the model 
should be uncorrelated with the data, 

(ii) it should not be required to estimate the parameters 
associated with low frequency components in the data, 

(iii) the estimator should be protected from the effects of 
noise associated with the data. 

These requirements can be satisfied as follows. 

The integrated noise structure in an IAR, ARIMA or 
ARIMAX model, involves the differencing of the data, which 
amounts to high-pass filtering. One of the ways of 
implementing low-pass filtering at the same time is to use a 
noise observer polynomial (T(q ;) as shown below. Consider 
an ARIMAX model (2.4.7): 

A(q -1 )y(k) _ Btq^Mk-d) + C(q~ 1 )e(k) ^ 4 

T(q -1 ) T(q _1 ) AT(q -1 ) 

or 

A(q _1 )Ay f (k) = B(q _1 )Au f (k-d) + e(k), (2.4.9) 

where 

y f (k) = y(k)/T(q _1 ), u f (k) = u(k)/T(q _1 ), 

and it is assumed that C(q _1 ) = T(q _1 ). A typical choice for 
T(q _1 ) is 

T«r'> - . 

Here, T(q _1 ) acts as a first-order filter with a steady 
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state gain of unity. 

The differencing of the data together with the use of 
the noise observer T(q j is akin to bandpass filtering (see 
Appendix 14A); so the data (Ay f , Au f etc.) available to the 
estimator in (2.4.9) are effectively bandpass filtered data. 
It may also be noted that the model is so configured that 
the noise <e(k)> works out to be uncorrelated by hypothesis. 

Remarks 

(1) If the frequency range of the noise components and that 
of the true signal overlap, the elimination of noise will 
also decimate the inf ormation in the data, which is 
undesirable. In such cases, the effects of noise may be 
reduced through optimal estimation discussed in Sec. 6. 6. 

(2) The noise observer polynomial T(q _I ) may also be used 
in ARMA models (2.4.3) and ARMAX models (2.4.6). 

(3) Models like (2.4.8) (i.e. incorporating the use of 

T(q )) feature in Chapters 5 and 12. 


2.4.3 Other Models 

Some transfer-function models which are popularly used in 
long range predictive control (see Secs. 12.4 and 12.5) are 
the pulse response and the step response models, which are 
discussed here. 

Impulse response models 

Consider an input signal {u(t)> which is a pulse of 


O T 


S(t) 



(a) 


(b) 


Figure 2.4.1 (a) A unit pulse, (b) a unit impulse. 



32 Chapter 2 Process Models 



Figure 2.4.2 System response to an impulse input 


magnitude u 0 and duration T as shown in Fig. 2. 4. 1(a). The 
intensity of the signal is given by the area u 0 T. As T is 
reduced, u D has to be increased if the intensity is to 
remain unchanged. This leads to the limiting condition with 
T tending to 0, when the pulse is of infinitesimal 
duration, and is of infinite magnitude; such a signal is 
called an impulse (Fig 2.4.1(b)). When the area under the 
impulse is unity, it is referred to as a unit impulse. 

The unit impulse function, also known as the Dirac 
delta function, was originally defined by mathematician and 
physicist P.A.M. Dirac as 
00 

J S(t)dt = 1. 


S(t) =0, t * 0; 


that is the unit impulse function 5(t) is zero except at 
t=0, where it covers a unit area. 

The impulse response h(t), i.e. the response of a 
system with the unit impulse signal 5(t) as the input, 
completely characterizes a system in continuous-time 
representation: 


y(t) = 


£ h(x)S(t-x)dx. 


(2.4.10) 


For unit impulse input S(t), h(t) = y(t). 
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In practice, when a system is excited by an input 5(t) 
at t=0, the output y(t) will settle after a time T s (see 
Fig.2.4.2), called the settling time. Hence (2.4.10) can be 
expressed as 



h(x)S(t-x)dx. 


(2.4.11) 


The inputs that occurred earlier than the time T s in the 
past do not influence the present output. 

The impulse response is also referred to as the 

weighting function. 


Remarks 

(a) The convolution of two functions f t (t) and f 2 (t) is 
known as the convolution integral: 

y(t) = f t (t)*f 2 (t) 

f” 

= f^xjf^t-xjdx. 

—or 

Thus Equation (2.4.11) is a convolution integral subject to 
the causality condition: h(x) = 0, for x<0. 

(b) Equation (2.4.10) shows that the impulse response is 
directly given by the parameters of the continuous-time 
model. 


Pulse response model 

A unit pulse is defined as a pulse with unit magnitude and 
of unit sample-time duration: 
u(0) = 1, 

u(k) = 0, for k * 0. 

In discrete time representation, a single-input single- 
output system is completely characterized by the pulse 
response (also loosely called the impulse response), given 
by the response of the system to a unit pulse (Fig.2.4.3). 

The discrete time equivalent of the convolution 
integral (2.4.10) is given by 
00 

y(k) = £ h(j)u(k-j). (2.4.12) 

J=0 

For unit pulse input, y(k) = h(k). Following (2.4.12), 
y(k) = h(0)u(k) + h(l)u(k-l) + h(2)u(k-2) + ... 

+ h(n)u(k-n) + ... 



k 


Figure 2.4.3 Pulse response of a system 
That is 

y(k) = h(j)q" J ju(k) 

* H(q -1 )u(k), (2.4.13) 

where H(q _1 ) is referred to as pulse-transfer function. 

Since by definition, u(0) = 1, and u(k) = 0 for k * 0, the 

values of y(k) for k = 0, 1, 2, etc. will be given by the 

coefficients of H(q ): 
y(0) = h(0), 

y(l) = h(l), 

y(2) = h(2), etc. 

Usually y(0) = 0 = h(0). If the settling time is N samples, 

y(N+i) - h(N+i) = 0, i = 1,2,..., etc. 

Thus the function h(k), called the pulse response, is 

obtained by applying a unit pulse input and measuring the 
system output at successive sampling instants. 

If the system is expressed by the discrete time model 

A(q _1 )y(k) - B(q~ X )u(k), (2.4.14) 

where 12 -n 

A(q -X ) = 1 + a t q + a 2 q + ... + a^q"", 

B(q~ X ) = b 0 + b t q 1 + b 2 q 2 + ... + b n q n . 

y(k) can be expressed by the transfer-function model 
36 Chapter 2 Process Models 
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y(k) = u (k), 

A(q _I ) 

= H(q -1 )u(k), 

which shows the equivalence between the transfer-function 
model and the pulse response model. 

Remarks 

(a) The transfer-function model (2.4.14) requires less 
parameters to describe the system behaviour compared to the 
pulse response model (2.4.13). 

(b) The pulse response is directly given by the parameter 
values of H(q - ) in the model: y(k) = H(q 1 )u(k). 

(c) A process expressed by a pulse response model is open- 
loop stable, as otherwise the pulse response cannot be 
valid. 

(d) If the process has a dead time d (i.e. if y does not 
show any change for d number of pulse durations after u is 
changed), hi = h 2 = ... = h d _ t = 0. 

Step response model 

A unit step u is defined as follows: 

(a) in continuous-time representation, 

u(t) = 0 for t < 0, 
u(t) = 1 for t a 0, 

(b) in discrete time representation, 

u(k) = 0 for k < 0, 
u(k) = 1 for k £ 0. 

Thus unit step is the first time-integral of a unit impulse 
(or a unit pulse) occurring at the time instant (or the 
sampling instant) at which the discontinuity appears in the 
step f unction. In discrete time representation, a step is 
equivalent to a train of pulses appearing at consecutive 
sampling instants. Hence 

s o = h 0 , 

®i = hi + h 0 , 
s 2 = h 2 + h t + h 0 , 

p 

s p = £ hj (2.4.15) 

J =0 

etc. , 

where (s^ and (hj) starnd for step response and pulse 
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Figure 2.4.4 Step response of a second order system 


response respectively. Thus the step response {sj> never 
converges. Following (2.4.13), the step response is given by 
00 

y(k) = £ SjuOc-i). 

1=0 

- S(q -1 )u(k) = u(k), A = 1-q" 1 , (2.4.16) 

where 

S(q _1 ) = £ Sjq" 1 
1 = 0 

is the step response transfer function (see Fig.2.4.4). 


2.5 MODELS BASED ON FREQUENCY DOMAIN ANALYSIS 

A periodic or nonperiodic time series or signal can be 
composed of a number of sinusoidal components, each having a 
particular frequency. So the series cam be modelled in sine 
and cosine trigonometric f unctions (or as exponential 
functions) of the constituent frequency components. In this 
connection, the sampling rate or the number of data points 
available within a certain time span is also important, 
because the sampling rate limits the highest f requency 

component that can be present in the sampled signal. 

This section introduces the concepts of modelling the 
periodic and aperiodic signals in terms of the constituent 
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frequency components. The frequency domain representation of 
a periodic signal in terms of a Fourier series and that of a 
nonperiodic signal in terms of a Fourier transform are 
described. Some of the most pioneering works in the area of 
frequency domain representations were done by the French 
mathematician J.B.J. Fourier (1768-1830). 

The sampling aspects of a time series or signal are 
also studied in this section. 


2.5.1 Representation of a Periodic Signal and 
the Fourier Series 

Consider a continuous periodic signal f(t), with period T 
(Fig.2.5.1). By definition, a periodic signal repeats itself 
after the period T, that is 

f(t) = f(t+T) 

= f(t+nT), n = ±1, ±2... 

The objective is to represent f(t) in terms of functions of 
constituent frequency components. An obvious choice is to 
use sinusoidal functions. Define w 0 to be the angular 
f requency corresponding to the time period T, i.e. w c = 
2ir/T. Since 

coswt = cos(wt±2n) and sinwt = sin(wt±2n), 
cosw 0 t = cos(w 0 t+2wn), n = ±1, ±2, ... 

= cosw 0 (t+nT), (2.5.1a) 



Figure 2.5.1 A continuous-time periodic signal f(t) 
having period length T. * 
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Similarly, 

sinw e t = sin« 0 (t+nT). (2.5.1b) 

Hence it should be possible to represent f(t) in terms of 
the sinusoidal functions. 


The Fourier series 


Consider the infinite series 

a 0 + (a 1 cosw 0 t+b 1 sin« 0 t) + (a 2 cos2&> 0 t + b 2 sin2w 0 t) 

+ ... + ( ajjCosnWot+bnSinnWot ) + ... 

If this series converges to f(t), it will also converge to 
f(t+T), following (2.5.1), leading to the following deduc- 
tions: 

Deduction A: The summation of sinusoidal components with 

angular frequencies 0, « 0 , 2w 0 n« 0 , produces a periodic 

signal with period T, where T = 2 n/u 0 . 

Here the zero frequency represents the constant 
component or the mean value. The converse of this result 
also holds as follows. 

Deduction B : Any periodic signal f(t), with period T, can be 
decomposed into an infinite number of additive sinusoidal 
components with angular frequencies 0, w 0 , 2w 0 , ..., nw 0 : 
00 

f(t) = a 0 + £( a n cosnw 0 t+b n sinnw 0 t ) . (2.5.2) 

n=l 

The magnitudes of the coefficients a t and b t define the 
pattern of the function f(t). The series in (2.5.2) is 
called the trigonometric Fourier series. 

One of the strong f eatures of the sinusoidal functions in 
(2.5.2) is their property of orthogonality which is used in 
computing the coefficients a„ and b n . The condition of 
orthogonality is stated as follows. 

A set of continuous functions 


(v m (t)> = <v 0 (t), Vi(t), ...> 


is orthogonad over the interval (t 0 , t 0 +T), where t 0 

arbitrary, if 



( D, if m = n, 
0, if m * n. 


D being a constant. 


is 
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The sine and cosine functions in (2.5.2) 
gonality as follows: 

show ortho- 

t 0 +T 

f * * .. f T/2, m = n, 

J cosm« 0 t cosn« 0 t dt = ^ Q> m * n _ 

to 

(2.5.3a) 

t 0 +T 

J cosmw 0 t sinnw 0 t dt = 0, for all m,n. 
to 

(2.5.3b) 

t„+T 

f . . . ... [ T/2, m = n, 

J smm« 0 t sinn« 0 t dt = 1 m ^ n> 

(2.5.3c) 

Following (2.5.2) and (2.5.3), the coefficients 
can be determined as follows. 

aj and b t 

t c +T 


a o “ ^ J f(t)dt, 

*0 

(2.5.4a) 

t 0 +T 


a„ = | J f(t)cosnw 0 t dt, 

t A 

(2.5.4b) 

V 

t„+T 


b n = ^ J f(t)sinnw 0 t dt. 

(2.5.4c) 


to 


The a t and bj coefficients are guaranteed to exist subject 
to the Dirichlet conditions : 

(a) the periodic integral of | f (t) | should exist, that is 

t c +T 

f | f (t) | dt < oo, and 
to 

(b) f(t) must be finite or have a finite number of 
discontinuities in one period. 

Cosine series representation 

The series (2.5.2) can also be expressed in terms of a 
cosine series, as 
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00 


f(t) = c 0 + £ c n cos(nw o t+0 n ), 

n=l 


(2.5.5) 


where 


^ 2 2 ,2 
c o “ a o* c n ~ ^ + 


0 n = “ tan 




c 0 is the constant component, expressed as the average value 
of f(t) over one period as in (2.5.4a). 


Exponential Fourier series 

Instead of the sinusoidal functions, the periodic signal 
f(t) can also be expressed in terms of exponential functions 
(see Appendix 2): 

00 

fCt) - £ g„e lnWot , i = -PI. (2.5.6) 

n=-oo 

Equation (2.5.6) is the complex or exponential Fourier 
series representation of f(t). The coefficients g n are 
given by 


t c +T 


g » * T J 


f(t)e _lnWot dt, 


(2.5.7) 


where t c is arbitrary, and g 0 = a 0 , as in (2.5.4a). 

Thus the exponential Fourier series represents the 
spectrum of f(t) (that is the amplitudes of f(t) over 
various discrete frequencies), which is referred to as a 
discrete spectrum or a line spectrum. 


2.5.2 Representation of a Nonperiodic Signal and 
the Fourier Transform 

A nonperiodic signal can be expressed as a continuous sum of 
exponential functions of frequencies lying in the interval 
-oo <w <oo. The mathematical expression is developed as 
follows: 

(a) The nonperiodic signed f(t) of finite length in time, 
t, is considered to be a part of an augmented periodic 
signal, f a (t) with period. Tax (Figs.2.5.2 and 2.5.3). 

(b) The augmented periodic signal is expressed by the 
exponential series (2.5.6). 
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Figure 2.5.2 A nonperiodic process of length t. 



Figure 2.5.3 An augmented periodic process with 
period T fc t. 

(c) The limit conditions, T » eo and corresponding angu- 
lar frequency, w 0 > 0, applied to f a (t) leads to the 

continuous frequency representation of f(t). 

Following (2.5.6-2.5.7) the periodic process f a (t) can be 
expressed by the exponential series 

f.(t) = I g an e lnW °\ « 0 = 2n/T 

n=-oo 

with the complex coefficients 


T/2 



As the time period T increases and tends to infinity, u> 0 
tends to infinitesimally small value or du>, and nw 0 becomes 
the continous angular frequency w. Hence introduce 

T/2 

F(w) = lim Tg an = lim f f(t)e in£i>ot dt. 

T — >oo an T — >oo I 

-T72 


(2.5.8) 
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Again 

f(t) - = Jim, F( W )e ,nWot 

n=-oo 

Since 


lim - = lim ^ , 
T-- >« t w 0 — »o 2 n 


f °o 

f(t) = — F(«)e lwt dw, 
2U J 
-oo 

where following (2.5.8), 


(2.5.9) 


F(w) = J f(t)e“ lwt dt. (2.5.10) 

— 00 

The existence of F(«) is subject to the condition that f(t) 
is absolutely integrable in the time interval (-oo, eo). 

Equations (2.5.10) and (2.5.9) are referred to as the 
Fourier transform of f(t) and the Inverse Fourier transform 
of F(«) respectively. Since F(w) represents the frequency 
spectrum of f(t), it is called the spectral density 
function. 


Some common functions in time and their Fourier transform 
are shown in Fig. 2. 5. 4; the following features may be noted: 

(a) The Fourier transform of the rectangular pulse in 
Fig.2.5.4(a) is given by 


F(w) 


sin(<<>T/2) 
T (wx/2 ) 


This function is referred to as the sampling function. 

(b) For the sine function in Fig.2.5.4(c) the phase of F(w) 
at o) 0 and -« 0 will be n /2 and -n /2 respectively. 

(c) It is evident from Fig.2.5.4(b) and Fig.2.5.4(c) that 
sinusoidal functions (and hence exponential functions) cause 
a frequency translation in the transform domain. 


Remarks 

(1) Both the Fourier transform and the Fourier series 
concern decomposition of a signal into constituent compo- 
nents having specific frequencies. The Fourier transform of 
an aperiodic signal is a linear combination of the 
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Figure 2.5.4 Some common time functions and their 
Fourier transform representations. 

(a) A rectangular pulse, (b) a cosine function, 

(c) a sine function, (d) a constant value, 

(e) a unit impulse, (f) a uniform pulse sequence. 
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exponential f unctions of f requencies occurring at a 
continuum of frequencies (which means the corresponding 
frequency spectrum is a continuous spectrum), whereas the 
Fourier series of a periodic signal is a linear combination 
of the exponential functions of frequencies occurring at 
discrete frequencies (i.e. the corresponding frequency 
spectrum is a discrete or line spectrum). 

(2) For a signal spanning a finite time x, the values of the 
Fourier transform F(«) at the points, w = 2im/x, will be the 
same as the discrete spectrum produced by the Fourier series 
of the corresponding augmented periodic function with period 
x. The envelope of the discrete spectrum due to the Fourier 
series will be the same as the envelope of the continuous 
Fourier transform spectrum. 

(3) The Fourier transform is also an orthogonal transform. 


2.5.3 Discrete-time Signals and their Fourier Transform 

Although most real-lif e processes are inherently continuous- 
time processes (for example, the population of a country or 
say the temperature of the molten hot metal tapped from a 
blast-furnace), for ease of computation in modelling or 
other applications, a discrete time representation of the 
process is considered. A discrete time representation 
implies measurement or sampling of the process variable at 
discrete time intervals. The sampling rate (i.e. the number 
of samples per unit time) is of fundamental significance for 
any discrete time representation. 

The sampling theorem 

A data sequence or signal may contain a number of sinusoidal 
components. 

The sampling theorem states that the sampling frequency 
f B in number of samples per second or Hertz (abbreviated as 
Hz) has to be at least twice the highest frequency 
component, f m , present in the signal (i.e. f B a 2 / m ). In 
other words, if a continuous time signal is sampled at a 
frequency f B , the sampled signal will contain all the 
frequency components of the original signal which are less 
or equal to f c = f B / 2. The frequency f c is called the 
Nyquist critical frequency, and f B is called the Nyquist 
rate of sampling. 
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There are two basic implications of the sampling theorem: 

(i) Frequency components lower than or equal to / c can only 
be used to form the original signal. 

(ii) If the signal is not band limited to less than the 
Nyquist critical frequency, / c , all the power spectral 
density outside the frequency range, -f c <f<f c , 
spuriously moves into the frequency range (-/ c ,/ c ). 
This phenomenon is called aliasing. 

The sampling theorem is illustrated with the help of an 
example (Fig.2.5.5). Consider a continuous-time signal f(t), 
which is band limited in frequency, i.e. the Fourier trans- 
form (Fig.2.5.5(b)) shows no frequency components beyond 
radians per second (or, f m Hz, w m = 2 nf m ). Sampling of f(t) 
every T s seconds is equivalent to considering convolution of 
f(t) with a uniform train of impulses S a (t) having the 
period T s . The impulse train is referred to as the sampling 
f unction with the sampling period T s and the f undamental 
angular frequency w s (= 2n/T a ), as the sampling frequency. 
The convolved or sampled output f'(t) is a train of impulses 
having the magnitude of f(t) at equispaced time intervals T s 
given by 

f'(t) = f(t)S a (t). 

The Fourier transform of f ' (t) (Fig.2.5.5g) shows clear 

separation of the f requency components when the sampling 
frequency u s >2w m . 

The importance of the sampling frequency being at least 
twice the maximum frequency component present in the signal 
is illustrated in Fig.2.5.6. The spectrum of the sampled 
signal, F'(w) fails to contain the true representation of 
F(w) when w s <2w m . This phenomenon is called aliasing, when 
the higher frequencies in the signal appear as lower 
frequencies in the spectrum causing corruption of f'(t). 
Since the higher frequency components over the range (w m -w a ) 
are as it were folded back at the frequency w s /2 to lower 
frequency range, this phenomenon of aliasing is also called 
frequency folding. To avoid aliasing, often an anti-aliasing 
filter is used which is basically a low-pass filter, 
attenuating frequency components higher than w s /2; low-pass 
filtering is discussed in Appendix 14A. 

The original signal f(t) can be recovered from the 
sampled signal f'(t) by passing the sampled signal through 
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Figure 2,5.5 Sampling of a continuous-time signal and 
the frequency spectrum of the sampled signal. 

(a) The continuous-time signal f(t), 

(b) The spectrum of f(t). 

(c) Uniform pulse train S Xg ( t). 

(d) Fourier transform of S Tg (t); sampling frequency = w s . 

(e) Convolution of two signals f(t) and S Tg (t). 

(f) The convolved signal f 7 (t) or the sampled signal f(t). 

(g) F'lw), the spectrum of f' (t), with w a >2w m . 
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F(co) 



(a) 



(b) 

F'(®) 



(c) 

Figure 2.5.6 Effects of sampling frequency on the 
frequency spectrum of the sampled signal, 

(a) spectrum of original signal, 

(b) spectrum of sampled signal with « s = 2w m , 

(c) spectrum of sampled signal with w s <2w m ; aliasing 
resulting from low sampling rate. 


an ideal low-pass filter having a cut-off frequency greater 
than o> m but less than (w a -to m ). A separation between w ra and 
(u> s -w m ) is preferred; in practice, most designers like to 
have a sampling rate at least 20% higher than the minimum 
desirable rate based on frequency considerations. 
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Discrete Fourier transform and fast Fourier transform 


The discrete Fourier transform (DFT) is a way of specifying 
the sinusoidal signals that constitute a finite discrete 
data sequence. Given any N-long data sequence (x(j)> = 
<x(0), x(l) x(N-l)>, its DFT is defined as 


1 N_1 

x w (m) = - £ x(j)exp(-i(2Tr/N)mj), m = 0,1 N-l, 

(2.5.11) 

and the inverse DFT is defined as 

N-l 


x(j) = - E x w (m)exp(i(2ir/n)jm), 


N m e o 


j=0,l N-l. 

(2.5.12a) 

If the data are complex, the DFT x w (m) will be the complex 
valued density functions of the N sinusoidal components, 
having freqencies 0, 1/N, 2/N,..., (N-l)/N; the frequencies 
are the normalized frequencies. If the data are real valued 
(which is usually the case in the present context), the 
frequencies of the constituent sinewaves are 0, 1/N, 2/N, 
.... (N/2)/N; if N is odd, the highest frequency component 
is (N-1)/2N. The finest resolution between two adjacent 
components produced by DFT is 1/N. 

For the real valued data sequence, (x(j)> can also be 
expressed as 

N/2 

x(j) = x(0) + £(a n cos((n/N)2nj) + b n sin((n/N)2irj)), 

n=l 

(2.5.12b) 

where x(0) is the average value of the sequence (x(j)>, N is 
the length of the sequence (assumed to be even), and 


a„ = ^ ^x(j)cos((n/N)2nj), 
b n = jj ^x(j)sin((n/N)2nj). 

There are various efficient algorithms for the computation 
of the DFT; the most well known of these are collectively 
referred to as the fast Fourier transform (FFT) algorithms. 


Example 2.5.3 Model the yearly variations in the Earth’s 
rate of rotation (Appendix 8B) using Fourier transform 

The concerned series (see Fig.2.2.2) has 150 data points; it 
is padded with zeros to 1024 points, and a 1024-point FFT is 
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Figure 2.5.7 The frequency spectrum of the series on 
the yearly variations of the earth's rate of rotation. 


performed on the augmented series. The frequency spectrum 
obtained is shown in Fig.2.5.7. The mean or average 
component and the 9 strongest sinusoidal components in order 
of magnitude are selected which have the normalized 

frequencies f t (i=l to 9) given by 

19/1024, 6/1024, 20/1024, 18/1024, 7/1024, 5/1024, 

21/1024, 17/1024, 8/1024. 

The series may be modelled as 

9 

y(k) = a 0 + j b.sin(27rkf«) + e(k), 

1=1 

where e is the noise, and k = 1,...,150. The least squares 

estimation is used to estimate the parameters, which are 
obtained as follows. 

a 0 = -200.15, b x = 11653.86, b 2 = -5716.47, b 3 = -6891.67, 

b 4 = -9697.50, b 5 = 4644.21, b 6 = 2758.74, by = 1611.39, 

b 8 - 3458.48, b 9 = -1431.74. 

Remarks 

(1) Here, although 150 data points were available, a 
1024-point FFT was performed for higher resolution in the 
detection of the frequency components present, which is 
1/1024 here. The concerned extension of the data series by 
zeros does not affect the result. 

(2) In the model, only sine terms are considered for the 
sake of simplicity; cosine terms could also be considered. 
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2.5.4 Modelling of a Periodic Signal 

There are two basic features of a periodic signal, from the 
sampling point of view: 

(i) the smallest length of time for which the data are 
available, and 

(ii) the period-length or time period at which the pattern 
tends to repeat. 

Consider the modelling of any nearly periodic series with 
monthly data and yearly periodicity. Here, 

(i) the sampling period, T 9 is one month, i.e. T s = 1, and 

(ii) the period tends to repeat every 12 months, i.e. the 
time period, T = 12T S . 

Any series which has a periodicity of T, will have the 
f undamental frequency component, w 0 = 2ir/T, and can be 
modelled as a linear combination of cosine and sine 
functions of w 0 and its harmonics 2w 0 , 3w 0 ,..., and an 
average (or constant) term. 

Again, according to the sampling theorem (Sec. 2. 5. 2), 
if a signal or time series is sampled at T s intervals, that 
is w s = 2ir/T s , the maximum frequency component that can be 
present in the sampled data is given by w m s w s / 2 . 


Example 2.5.4 Model the monthly Indian rainfall series 
(Appendix 7F). 

This series (shown in Fig.2.2.1) is a monthly data series, 
and has yearly periodicity. 

For modelling, assume T s = 1, w s = 2n. So 

(a) w m s 6w b / 12, since T = 12, u 0 = 2ir/12, 

(b) the frequency components contained in the sampled 
signal are 

2ir(l/12), 2tr(2/12), 2ir(3/12), 2ir(4/12), 

2x(5/12) and 2n(6/12). 

Using the trigonometric Fourier series (2.5.12b), for the 
d-th year, 

6 

f d (t)=a 0 + £ (a n cos((n/i2)2?it) + b n sin((n/i2)27it)), 

n= 1 

(2.5.13) 

where t = 1,2 12, and the values of the time series 

f d (t) are known. For n = 6, 
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cos((n/i 2 ) 2 irt) = coswt = (-1)\ and 
sin((n/i 2 ) 2 nt) = sinirt = 0 
Again, 

2 12 

a = — t f d (t)cos((n/i 2 ) 2 irt), and 
12 t =i 

b n = £ f d (t)sin((n/i2)27rt), 

12 t=l 

where n = 1,2,..., 5, and 

a o = n z 2 fd(t), a 6 = ± zutn-if. 

12 t=l 12 t=l 

The values of the coefficients a„ and b n so obtained pertain 
to the data for the d-th year. 

For the present series for the year 1941, 

a 0 =54.28, aj=-57.41, a 2 =1.22, a 3 =10.06, a 4 =-5.58, 

as= 3.15, a 6 =-4.01, 

b^-57.13, b 2 =56.15, b 3 =-16.58, b 4 =8.64, b s =-4.82, b 6 =0, 
in the model (2.5.13). 

Remarks 

(1) Since data for a number of years are available the 
coefficients a„ and b n in (2.5.13) may also be estimated 
using the method of least squares; for example in the 
present case, there are 12 sets of data available for each 
year for the estimation of the 12 parameters a 0 to a 6 and b t 
to b 5 . 

(2) Unlike the present case, if N is odd, there will be 
(N-U/2 sinusoidal components in the data f d (t) having 
frequencies n« 0 t, where 


in addition to an average or the zero frequency component. 


2.6 STRUCTURAL MODELLING 

In structural modelling, instead of modelling the composite 
series as it is, each of its structural components is 
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separately modelled. 


2.6.1 A Basic Model 

A basic structural model of any series or signal (y(k)) has 
three components, namely the trend (y tr ), the seasonal 
component (y p ), and the irregular component (e): 

y(k) = y tr (k) + y p (k) + e(k). (2.6.1) 

In terms of frequency descriptions, the trend is the slowly 
varying average or low frequency component, and the seasonal 
component contains all the frequency components that are 
responsible for seasonal variations. As such, the trend and 
the seasonal component together should suffice to describe 
a seasonal process. The uncertainties and unmodelled 
dynamics account for the irregular component. The model is 
called structural because its components convey structural 
inf ormation. 

The trend component can be extracted by centred moving 
averaging (Appendix 4) or through bidirectional filtering 
(Sec. 14. 3.1) etc. Once the trend component is separated, the 
seasonal component in the remaining part of the signal may 
be modelled using any suitable method like the trigonometric 
functions approach (Sec.2.5.1), the Box and Jenkins approach 
(Sec.4.3) or the orthogonal transformation based methods 
(discussed in Chapters 7 and 11). Structural models can also 
be formulated using the state-space descriptions as shown in 
Sec. 6.4. 

Example : The German unemployment series (Appendix 7E), shows 
a trend component associated with a periodic component. In 
Example 14.3.1(2), the trend component is separated using 
bidirectional filtering (with a=0.9, in (14.3.1)). The 
structural components are shown in Fig. 14.4.3. 


2.6.2 Models with Multiple Periodic Components 

The seasonality in the time series need not be explicitly 
due to a single periodic component for the structural 
modelling to be applicable. There can be multiple periodic 
components (with different period lengths) present in the 
series, which cam be expressed as 
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y(k) = y tr (k) + y pl (k) + y p2 (k) +...+ y pH (k) + e(k), 

where each sequence <y pl (k)> (for i = 1 to N) is periodic 
with a repeating pattern; the magnitudes of the patterns of 
<y p i (k)> over different periodic segments may not be the 
same. 

The trend component may be separated the same way as in 
Sec. 2. 6.1. The remaining part of the signal will constitute 
a combination of multiple repetitive pattern components and 
noise. The successive separation of the periodic components 
y pl (i = 1 to N) can be performed using singular value 
decomposition as discussed in Sec. 11.4; each of the periodic 
components as well as the trend may now be modelled 
individually as desired. 

Typically a periodic component, y pI , may be modelled as 
<y pi (k)> = <u 1(pl) vi (pl) >, 

T 

where Vj (pl ) represents the periodic pattern of the i-th 
periodic component, and the elements of u 1(pl ) are the 
scaling f actors associated with the successive periods; the 
sequence of elements of u 1(pl) may be modelled as a time 
series. 

Example 11.4.5(1), studies a case where a noisy 
composite signal has been decomposed and three periodic 
components have been produced. The results are presented in 
Figs.11.4.2 and 11.4.3. The modelling of the individual 
periodic components is treated in Secs.7.7-7.8. 


2.7 CONCLUDING REMARKS 

The aim of this chapter has been to introduce the broad 
perspective of modelling. 

The model design is based on the available data and 
other inf ormation on the process. So the representativeness 
of the model depends both on the representativeness of the 
data and the accuracy of the information available, and the 
way the available information is used in modelling. 

There is no best design of a model; one rather 

heuristically aims at the best possible compromise between 
the various contributing factors and hopes the model to 

behave like the actual process. Some of the important 

f actors are the model structure, the uncertainty associated 
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with the model, and the long-term validity of the model. 

A real-life problem may not conform to the familiar 
model structures. The problem has to be appropriately 
configured (e.g., with suitable preprocessing of the data 
etc.), so that the imprecision or vagueness in the 
underlying assumptions is minimized. It is important that 
the modelling problem is so formulated that the estimation 
of the parameters with an acceptable degree of precision is 
feasible. In this connection, the characterization of the 
uncertainty or noise associated with the model deserves 
careful consideration. There is a common tendency to assume 
convenient idealistic characteristics of the residual 
component referred to as noise, to satisfy the 
preconditions of the validity of the estimation methods. It 
is important for the user that the assumptions used are 
known and their validity is ensured; in real life the 
validity of the idealistic assumptions about the nature of 
the data and the noise can be only approximately assured and 
hence the resulting estimates work out to be approximately 
true. 

The numerical robustness of the estimation and other 
operations used for modelling also deserve careful 
consideration, as this concerns the validity of the model. 

The model should preferably be as simple as possible. 
The ultimate objective is to produce a model which is 
representative and which continues to remain valid, with 
relevant adaptation if necessary. 


REFERENCES 

Remarks: There are many books on various aspects of process 
modelling, only some of which are listed here. Two 
authoritative texts on stochastic processes are [4,13]. A 
broad coverage of modelling methods appears in [11], and 
modelling, along with estimation aspects, is detailed in 
[8,2,5]. Books covering specific application areas are 
another class, e.g., [1,15]; state-space modelling has been 
studied in [3,16], GMDH is explored in [6]; many books are 
devoted to the area of neural network modelling, e.g., [14]. 
Models based on frequency domain analysis are detailed in 
many texts, for example [9,10,12]. Structural modelling 
features in [7]; see also Sec. 11.5. 
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CHAPTER 3 


PARAMETER ESTIMATION 


Correct parameterization and estimation ensures 
the representativeness of a model. 


3.1 INTRODUCTION 

System identification is a prerequisite to adaptive 
prediction and control; it concerns the generation (for 
example through specific experimentation) and collection of 
inf ormation, revealing the characteristic behaviour of the 
process, and development of a mathematical representation of 
the process. Thus while parameter estimation concerns the 
determination of the numerical values of the parameters of 
the process model which best describe the dynamics of the 
process, identification involves model structure selection, 
collection of relevant information, parameter estimation, 
and model validation. The nature of the model is very much 
process and problem dependent, as discussed in Chapter 2. 
This chapter is primarily devoted to the problem of 
parameter estimation, model order selection and validation. 

There are different methods of parameter estimation. 
The suitability of a method depends on the quality of 
information contained in the data, the conceptual model 
structure and the application concerned. A detailed study of 
the estimation methods is beyond the scope of this book. The 
discussions are focused mainly on the least squares (LS) 
method, which is a basic method for parameter estimation. 
The quality of the estimates are shown to depend on the 
nature of the noise, and the richness of the information 
contained in the data. Both the off-line and the recursive 
implementations of the LS estimator are presented, and the 
computational aspects are studied. It has been shown that 
using orthogonal decomposition, ill-conditioned LS estima- 
tion problems can also be solved. 

One constraint of the LS method is that the noise needs 
to be uncorrelated with the measurement of the dependent 


56 



3.1 Introduction 57 


variable; there are many other estimation methods which can 
be used without any such restriction. This chapter includes 
discussions on three such methods, namely, the instrumental 
variable method, the maximum likelihood method, and the 
Koopmans-Levin method implemented using the singular value 
decomposition. The instrumental variable method avoids the 
requirement of the noise to be uncorrelated with the data, 
through the introduction of some auxiliary variables 
referred to as instrumental variables. The maximum 
likelihood method involves nonlinear optimization of the 
statistical information contained in the data; the method 
can produce parameter estimates with most of the desired 
properties, although at the cost of relatively intensive 
computation. The Koopmans-Levin method can handle noise 
associated with both the dependent as well as the 

independent variables, and can produce estimates equivalent 
to approximate maximum likelihood estimators. 

Proper model structure selection and testing of the 
validity of a model are necessary for representative 

modelling. The study includes the Akaike Information 
Criterion (AIC), the methods of robust modelling using 
subset selection, and cross validation. The Akaike 
Information Criterion, which provides a statistical estimate 
of the appropriateness of a model, is a popular method for 
model order selection. Cross validation permits the test 
for the validation of a model through the use of a set of 
data, which has not been used for parameter estimation. 

Subset selection can be used f or the selection of specific 
variables to form the most representative model. Two classes 
of subset selection have been studied: (a) subset selection 
from an information set, and (b) selection of independent 
variables in a regression problem. The implementation of 
subset selection through singular value decomposition and 
some special forms of QR factorization has been discussed. 
Subset selection is a powerful approach with enormous 
application prospects in the areas of identification, 

estimation and control; it has been applied in varied 
classes of problems in this book. 

The linear regression and the least squares estimation 
methods are developed in Sec. 3. 2. The computational aspects 
discussed in Sec. 3. 3 concentrate on numerically robust and 
computationally efficient implementations; there is particu- 
lar stress on LS estimation using orthogonal transf ormation. 
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Estimation with orthogonalized regressors is also presented. 
The recursive LS algorithm and its implementation are 
treated in Sec. 3. 4. This is followed by an exposure to the 
instrumental variable method, the maximum likelihood method, 
and the Koopmans-Levin method of estimation, in Sec. 3. 5. 
Finally AIC, model selection and cross validation are 
treated in Sec. 3.6, wherein the problem of best subset AR 
modelling and the application of subset selection in 
regression problems are also discussed. 


3.2 LINEAR REGRESSION AND THE LEAST SQUARES 
METHOD 

Linear regression concerns linear-in-the-parameters repre- 
sentation, relating one or more independent variables with 
the dependent variable; least squares (LS) is a method for 
the estimation of the parameters of the linear regression. 
The term independent means ‘independently appearing’ in the 
regression equation; the independent variables may not be 
statistically independent. The present discussion on the LS 
estimation is confined to the off-line method only; the on- 
line method of recursive least squares estimation is 
discussed in Sec. 3. 4. 

Linear regression 
Consider the following models: 

(a) y(k) + a^k-l) +...+ a„y(k-N) 

« b^tk-l) +...+ t^uCk-N) + e(k), (3.2.1) 

(b) y(i) = /3 1 x 1 (i) +...+ 0 p x p (i) + e(i), (3.2.2) 

In (3.2.1), y(k) is the dependent or the response variable, 
and y(j) and u(j), for j = (k— 1) to (k-N), are the variables 
on which y depends; e(k) is the noise or the modelling error 
term. For input-output processes, y and u are the output and 
the input measurements respectively. In (3.2.2), the i-th 
observation of the dependent variable y is expressed in 
terms of the i-th observations of the variables, x t to x p . 
Note that the models (3.2.1) and (3.2.2) have the same 
algebraic structure. 

A concise vector expression for (3.2.1) is given by 
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y(k) = h T (k)0 + e(k), (3.2.3) 

where h, the data vector, and 0, the parameter vector, are 
given by 

h(k) = [-y(k-l) -y(k-2) ... -y(k-N) u(k-l) ... u(k-N)] T , 

T 

0 — [a^ a 2 ... sty bj b 2 ... b^l . 

Let 0 be assumed to be an n-vector. The models (3.2.1), 

(3.2.2) or (3.2.3) are called the linear regression models, 
and the data elements, that is the elements of h(k) in 

(3.2.3) are called the regressors. Following (3.2.3), if the 
measurements are available over the time (k-m+1) to k: 

y(k-l) = h T (k-l)0 + e(k-l), 

y(k-2) = h T (k-2)0 + e(k-2), 

y(k-m+l) = h T ( k-m+1 )0 + e(k-m+l), 
or in concise matrix notations 

y = H0 + e, (3.2.4) 

where 


’y(k) 


h T (k) 


"e(k) 

y(k- 1 ) 

, H = 

h T (k-l ) 

. e = 

e(k- 1 ) 

• 

y(k-m+l ) 


h T (k-m+l ) 


e(k-m+l ) 


and the n parameters, 0, are based on all the data available 
from time k to (k-m+1). The model (3.2.4) is referred to as 
the linear mutiple regression model. 

Remarks 

(1) The term linear in linear regression implies that the 
expression is linear in the parameters 0, and in the error 
e, in (3.2.3). Nonlinearly transformed observations may be 
used as the data (i.e. in h in (3.2.3)), for example, 

2 Q 

y(t) = a Q + a t t + a 2 t +...+ o^t , 

where a„ are the parameters and 1, t, t 2 t n are 

the regressors. 

(2) e(k) in (3.2.3) may comprise the measurement error in 
y(k), external disturbances or the modelling error in h T 0. 
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3.2.1 Formulation of LS Estimator 


Given m sets of observations y(k) and H(k), the objective is 
to estimate the n parameters of 0 in (3.2.4). The least 
squares estimation is based on the minimization of the 
scalar cost function, 

J(0) = fE(y(k-i)-h T (l£-i)fl) 2 (3.2.5) 

2 i=o 


= ly-H0] T [y-H0]. 

The minimization of J(0) with respect to 0 is given by 


5J(0) 

90 


- [y-H0] [-H] = 0. 


Again 


8 - J( - = H T H a 0, 
8 2 0 


(3.2.6) 


T 

since the matrix H H is positive semidefinite, and hence the 
function J(0) has a minimum, given by (3.2.6). Hence the 
least squares estimate 0 can be obtained as 

H T H& = H T y. (3.2.7) 

8 - (H T H] -1 H T y. (3.2.8) 


Remarks 

(a) Following (3.2.6), 

[y-H$] T H = e T H = 0, 

which is the orthogonality condition that LS estimation must 
satisfy, that is the LS estimate B has to be so chosen that 
the consequent error vector e is orthogonal to each of the 
columns of H. The equation (3.2.7) is called the normal 
equation. 

The LS estimation (3.2.8) is a linear transformation on 
y, and hence the LS estimator is referred- to as a linear 
estimator. 

(b) The LS estimator requires m>n, that is the number of 
measurements (m) to be greater than the number of parameters 
(n); usually m»n. If H has a full column rank, or if m = n, 
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or if the rank of H = minim, n), the LS solution (3.2.8) is 
unique. If rank(H) < n, the estimated parameters will not 
be meaningful. 

(c) Following (3.2.3), the term h T (^-i)i may be considered 
to be the estimation or prediction y(k-i). Hence, the error 
vector, e = y - y, may be considered to be the prediction 
error vector; so the LS method may be ref erred to as a 
prediction error method. 

(d) Although y and H are stated as in (3.2.4), the data in 
y or H need not be ordered in any particular sequence. 


3.2.2 Features and Properties 

The quality of the LS estimates depends on 

(i) the richness of information contained in H(k), and 

(ii) the statistical properties of the noise sequence 
(e(k)>. 

All the characteristic f eatures are related to these two 
f actors. 

Richness and quality of information in the data 

It is desirable that the data are rich in inf ormation, so 
that complete representation of the process dynamics is 
available to the estimator through the data; at the same 
time, the data should be balanced to the desired extent. 

Rank deficiency in LS estimation 

From an algebraic point of view, lack of information in the 
LS estimation shows up in the rank deficiency of H. In this 
connection the following aspects need some attention. 

(a) multicollinearity between the different regressors, 

(b) the data being too steady, 

(c) some independent variables being nearly orthogonal to 
the dependent variable y. 

Remark: Multicollinearity 

In mathematics, collinearity means the property of several 
points being on the same line. The term multicollinearity or 
simply collinearity refers to one regressor or a set of 
regressors (collectively) being a linear function of a set 
of other regressors. Collinearity in H will show up in 
one column (or a set of columns) of H being a linear 
combination of other columns. Note that collinearity is a 
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phenomenon. Collinearity or near collinearity has to be 
avoided for proper estimation of parameters. dq 

Algebraically, (a) will show up in redundancy in the 
columns of H, (b) will show up in redundancy between the 
rows of H, and both can result in H being singular and H*H 
being noninvertible; such classes of rank deficient LS 

estimation problems are discussed in Sec. 3. 3. 3. It is 

desirable to eliminate the variables from the regression 
problem that causes rank redundancy in H. 

It will be ideal, if the regressor vectors are mutually 
orthogonal; in such cases, the parameters associated with 

the individual regressors will remain unchanged even if some 
regressors are dropped. 

The set of relatively steady data contains little 

inf ormation, and hence should be pref erably eliminated in 
advance. For the same reason, the estimation of a constant 
term (i.e. with unity in the corresponding column in the 
data matrix H) is avoided; a model with integrating noise 
structure (for example an ARIMA model, as discussed in 

Sec.2.3.2) can absorb and exclude such constant terms. For a 
causal process it is desirable to have an input which is 

constantly changing (also referred to as ‘persistently 

exciting’), and is expected to be able to excite all the 

modes of the process. If the data are rich, the information 
matrix Hh will be nonsingular. 

Contrary to the conventional understanding, regressor 
variables which are orthogonal to y are not redundant, and 
should not be excluded from the estimator formulation. The 
relationship within linear regression is a group phenomenon; 
it is not possible to ascertain the relative importance of 
individual regressors through tests like correlation 
analysis etc. against the output variable. Subset selection 
in regression is discussed in Secs. 3. 6. 2-3. 6. 4. 

Uniformity of the data 

The columns of H should be balanced, that is should contain 
energy of the same order. This can be ensured through 
normalization of the individual columns. The normalization 
of the columns is also necessary before subset selection. 

The LS estimates can be seriously affected by outliers, 
that is abnormally large or small observations, which must 
be eliminated or truncated in advance. 



3.2 Linear Regression and the LS Method 63 


Weighted least squares estimation 

Differential importance may be ascribed to the data in H by 
appropriate weighting introduced in the cost function: 

J w (0) = i[y-H0] T W[y-H0l. 

If W = I, J w (0) = J(0). If W = diag [w 1( w 2 , w 3 , ..., w n J, 

J w (0) = * 1 w(i)e 2 (k+i). 

2 i = l 

The elements of W are appropriately chosen to increase (or 
decrease) the influence of the concerned data set on the 
least squares estimates given by 

= [H T WH]~ 1 H T Wy. 


Noise characteristics and properties of the estimator 

The LS estimation does not incorporate any probabilistic or 
statistical assumptions. However, the statistical attributes 
of the error function e, the square of which is minimized by 
the LS estimator, have to be defined in order to establish 
the statistical properties like bias, error covariance, 
consistency etc. of the LS estimator. 

(a) If the noise vector e in (3.2.4) is zero mean with 
known positive definite covariance matrix R, that is 

E[e] = 0, £[ee T ]= R, (3.2.9) 

and if e and H are statistically independent, the estimated 
parameters will be unbiased as shown below. 

Let the parameter estimation error be given by 



where 

y = H0 O + e, 0 O , the true parameters, 

$ = [H T H] X H T y, 0, the LS estimate. 

The orthogonality between e and H is implicit with the LS 
parameter estimation. 0 can be expressed as 

0 = 0 O - [H T H]' 1 H T [H0 o + e] 

- -lH T H] _1 H T e. 



64 Chapter 3 Parameter Estimation 


Hence 

£[0) = -[H T H] _1 H T £[e] = 0. 

Hence the estimated parameters are unbiased, that is 

£{&] = £[0 O J. 

The covariance of the parameter estimation error is given by 
£[00 T ] = P - £[[H T H] _1 H T ee T H[H T H]" 1 ] 

- [H T H]" 1 H T RH[H T Hr 1 . (3.2.10) 

Thus the error covariance matrix P, which is indicative of 
the performance of the estimator, does not depend on the 
observations y. 

Remark: Simple unbiasedness of the parameters is not very 
meaningful; the variance should be low too. 

(b) If the elements of the noise vector e are also Gaussian 
white with identical variances <r , that is 

R = <r 2 1, 

and hence 

P = <r 2 (H T H)'\ (3.2.11) 

the least squares estimate is unbiased as well as consis- 
tent, and is the same as the best linear unbiased estimate 
(BLUE) and the maximum likelihood estimate. 

Note that the estimate 0 is said to be consistent, if 
it attains the value 0 O asymptotically, that is if 

lim trace {r 2 (H T H) _1 = 0, 

N-^oo 

which follows from (3.2.11) in the present case; the trace 
of a square matrix A with elements a tJ is given by 
trace(A) = Za n . 

A consistent estimate need not be unbiased. 


3.3 LS ESTIMATION: COMPUTATIONAL ASPECTS 

There are mainly two approaches to off-line LS estimation: 

(i) solving normal equations, and 

(ii) orthogonal LS estimation. 
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3.3.1 Solving Normal Equations 

0 may be estimated directly by solving the normal equation 
(3.2.7): 

& = [H T H] _1 H T y. 

The direct solution suffers from poor numerical stability, 
particularly against round-off errors, because of the 
explicit inversion [IiH] . This inversion can be avoided by 
using Cholesky factorization. For any symmetric positive 
definite matrix H H, the Cholesky factorization is given by 

h t h = ldl t 

= [LD 1/2 HLD 1/2 ] T 

= GG T , (3.3.1) 

where L is a lower triangular matrix with unity diagonal 
elements, D is a diagonal matrix with positive elements, and 
G is a lower triangular matrix. The estimation procedure is 
as follows. 

T 

(a) Compute f = H y. 

T 

(b) Compute H H and perform Cholesky factorizaton: 

T T 

H H = GG , 

yielding the normal equations 

GG T & = f, (3.3.2) 

G is a lower triangular matrix and invertible, assuming H to 
be invertible. 

(c) Define z = G T 0, and solve for z in 

Gz = f. (3.3.3) 

(d) Solve for 6 in 

G T & = z. (3.3.4) 

This approach of estimation belongs to a class called the 
square root algorithms. 

Remarks 

(1) G being triangular, equations (3.3.3) and (3.3.4), can 
be solved without performing explicit inversion of G. For 
example, from Ax = y, x can be solved as follows: 
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(a) if A is lower triangular: 

x(l) = y(l)/A(l,l) ! x and y being n vectors. 

For i = 2 to n 

x(i) = (y(i) - J Z i A(i, j)x( j)>/A(i,i); 

(b) if A is upper triangular: 

x(n) ■ y(n)/A(n,n) 

For i = 1 to n-1 

x(i) = (y(i) - jZjACi.n-j+Dxin-j+Di/Aii.i); 

(2) Nonuniqueness of square root factors of a matrix does 
not influence the estimation through (3.3.3) and (3.3.4). 
For example, 

gg t « gww t g t = [GW][GW] T , 
f or any W, where WW T = I. 


3.3.2 Orthogonal LS Estimation 

The use of orthogonal decomposition f or LS estimation has 
been widely studied (Lawson and Hanson, 1974, Golub and Van 
Loan, 1989). These approaches are extremely well conditioned 
numerically, and can lead to robust estimation, although at 
the cost of relatively increased computation. Orthogonal LS 
estimation using QR decomposition (Appendix 3B) and Singular 
Value Decomposition (Sec.7.6) are discussed in this section. 

Estimation using QR decomposition 
The mxn regression matrix H is decomposed as 
H = QR, 

where the columns of the mxm matrix Q are orthonormal (i.e. 
Q T Q = I), and mxn matrix R is upper triangular and 

invertible. Hence 

H T H = R T Q T QR « R T R. (3.3.5) 

The normal equation (3.2.7) becomes 
R T R& = R T Q T y. 

So the LS estimate 6 is obtained by solving 
R& = Q T y. 


(3.3.6) 
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Unlike SVD, QR decomposition cannot be used for rank defici- 
ent LS estimation problems, that is when H is not a full 
rank matrix; however, QR with column pivoting factorization 
may be used in such cases as discussed in the next section. 


Remarks 

(1) Equations (3.3.1) and (3.3.5), show magnitude-wise 
equivalence between R the upper triangular part of R_of QR 
decomposition and G of Cholesky factorization as G = R . 

(2) As a comparison of numerical stability, if the desired 
computational precision is <e z for the direct solution of 
the normal equation (3.2.7), the required precision with 
orthogonal approach is <e only, although the latter entails 
a computational load, almost twice that of the former. 


Estimation using singular value decomposition (SVD) 


SVD is one of the most robust tools for LS estimation. SVD 
is discussed in detail in Sec. 7. 6. The present discussions 
are confined to the LS estimation only. 

SVD of any mxn matrix, H, is given by 

H = USV T , (3.3.7) 

where mxm T U and nxn V are orthogonal matrices: U T U = UU T = I, 
V V = VV = I; S is diagonal with nonincreasing elements 
(referred to as singular values), ordered down the diagonal: 


S = 




Si * s 2 a ...fc s n * 0. 


Using (3.3.7) on the normal equations, 
[H T H$ = H T y, 

VSU T USV T & = VSU T y, 

V T 8 = S'Vy. 


Hence the LS estimate is given by 

8 = VS -1 U T y. (3.3.8) 

S being a diagonal matrix, the elements of S” 1 are the 
inverse of the corresponding elements of S. Thus the 
computation in (3.3.8) effectively involves matrix 
multiplications only. 
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The unique feature of LS estimation using SVD is that H 
need not be a full rank matrix. LS estimation involves two 
steps (3.3.7) and (3.3.8). Estimation in case of rank 
deficient H is discussed in the following section. 

Remark: U need T not be explicitly formed for computing & in 
(3.3.8), as U y can be directly used in the implementation; 
this can reduce computational load. 


Example 3.3.2 Estimate the parameters of the model 

y(k) = a 0 + a 1 x 1 (k) + a 2 x 2 (k) + a 3 x 3 (k) + a 4 x 4 (k) + e(t), 
given the following data 


Table 

3.3.2 

Cement 

curing data (Hald, 

1952, Sec.20.3) 

Observations y 


x 2 

*3 

x 4 

1 


78.50 

7.0 

26.0 

6.0 

60.0 

2 


74.30 

1.0 

29.0 

15. 0 

52.0 

3 


104.30 

11.0 

56.0 

8.0 

20.0 

4 


87.60 

11.0 

31.0 

8.0 

47.0 

5 


95.90 

7.0 

52.0 

6.0 

33.0 

6 


109.20 

11.0 

55.0 

9.0 

22.0 

7 


102.70 

3.0 

71.0 

17.0 

6.0 

8 


72.50 

1 . 0 

31.0 

22.0 

44.0 

9 


93. 10 

2.0 

54.0 

18. 0 

22.0 

10 


115.90 

21 . 0 

47.0 

4.0 

26.0 

11 


83.80 

1 . 0 

40.0 

23.0 

34.0 

12 


113.30 

11.0 

66.0 

9.0 

12.0 

13 


109.40 

10.0 

68.0 

8.0 

12.0 

These 

data 

concern 

the relation 

between the 

heat evolved 

during 

the 

hardening of certain cements 

(y) and f our 

dependent 

variables: 

Xj, the 

percentage 

of tricalcium 

aluminate, 

x 2 , the 

percentage 

of tricalcium 

silicate, x 3 , 


the percentage of calcium aluminium ferrate, and x 4 , the 
percentage of dicalcium silicate. 

Here H is a 13x5 matrix, where the elements of the 
first column are all l’s (corresponding to the parameter 
a 0 ), and the values of x t to x 4 are contained in the next 
four columns of H. The estimation of the parameters 0 in 
y = H0 + e using different approaches is presented here. 
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Exercise 1: Estimation using Gholesky factorization 

f = H T y = [0.1241 1.0032 6.2028 1.3982 3.4733] T xlO 4 , 


G = 

' 3.6056 
26.9030 
173.6212 

20.3772 

12.3215 

52.4774 

0 


42.4346 

-18.2859 

1.1199 

12.5172 


108. 1665 

-14.2316 

-54.6073 

-12.8688 3.4497 


z = [344.0528 38.0799 34.7532 3.1295 -0.4970] T . 


The parameters estimated from (3.3.4) are given by 

& = [62.4054 1.5511 0.5102 0.1019 -0.1441] T . 

The sum of residual-square = 47.8636. 


Exercise 2: Estimation using QR decomposition 
The component R in H = QR is given by 


R = 


-3.6056 

0 


-26.9030 

20.3772 


-173.6212 

12.3215 

-52.4774 


-42.4346 

-18.2859 

-1.1199 

-12.5172 


-108.1665 

-14.2316 

54.6073 

12.8688 

-3.4497 


Here Q is a 13x13 matrix. The parameters estimated using 
(3.3.6) are the same as obtained through the Cholesky 
f actorization above. Note the magnitude-wise equivalence 
between the upper triangular part of R and G 1 . 

Exercise 3: Estimation using SVD 

T 

SVD of H produces: H = USV , where the singular values are 
211.3675, 77.2361, 28.4597, 10.2674, 0.0349, 

and 


0.0170 

-0.0037 

0.0000 

0.0110 

-0.9998 

0.1279 

0.0428 

-0.6459 

0.7513 

0.0103 

0.8397 

0.5092 

-0.0181 

-0.1876 

0.0103 

0.1984 

-0.0721 

0.7557 

0.6199 

0.0105 

0.4888 

-0.8565 

-0.1067 

-0.1263 

0.0101 
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Here U is a 13x13 matrix. The parameters obtained using 
(3.3.8) are the same as in the earlier cases. 


Exercise 4: Estimation with normalized data 

If the data columns are normalized with respect to unity 
mean and variance (for example for column elements y t : 

y t (normalized) = 1.0 + (yi-ynean^W-ymin). 

H n becomes as follows. 


‘ 1.0000 

0.9769 

0.5077 

0.6964 

1.5556' 

1.0000 

0.6769 

0.5744 

1.1700 

1.4074 

1.0000 

1.1769 

1.1744 

0.8016 

0.8148 

1.0000 

1.1769 

0.6188 

0.8016 

1.3148 

1.0000 

0.9769 

1.0855 

0.6964 

1.0556 

1.0000 

1.1769 

1.1521 

0.8543 

0.8519 

1.0000 

0.7769 

1.5077 

1.2753 

0.5556 

1.0000 

0.6769 

0.6188 

1.5385 

1.2593 

1.0000 

0.7269 

1.1299 

1.3279 

0.8519 

1.0000 

1.6769 

0.9744 

0.5911 

0.9259 

1.0000 

0.6769 

0.8188 

1.5911 

1.0741 

1.0000 

1.1769 

1.3966 

0.8543 

0.6667 

1.0000 

1.1269 

1.4410 

0.8016 

0.6667 


The singular values of H n are 

Si to s 5 : 8.0652, 1.6726, 1.3962, 0.4420, 0.0217, 

and the estimated parameters are 

$ - [47.2865 31.0221 22.9575 1.9363 -7.7793] T . 

The sum of residual-square (= 47.8636) is the same as with 
unnormalized data. 

Remarks : In the last exercise, since the columns of H n are 
normalized, it appears from 0 that the variables x 3 and x 4 
are relatively insignificant in this problem. If x 3 and x 4 
are rejected, the singular values of the truncated (i.e. 
13x3) H n are (6.3264, 0.9911, 0.6832) and the estimated 

parameters are 8 « (36.2557 29.3661 29.8013] T . 


3.3.3 Rank Deficient LS Estimation 


Rank deficient LS estimation ref ers to H being rank 
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deficient in the estimation problem 
y = H0 + e. 

As discussed in Sec.3.2.2, the rank deficiency of H can be 
due to collinearity between diff erent regressors, or due to 
the different sets of data constituting the rows of H being 
too steady or unchanging. When H is rank deficient, there 
are an infinite number of solutions to the LS estimation 
problem. There are two possibilities for the size of 0: 

(i) it can be considered to be n, where H is mxn, (m>n), 

(ii) it can be considered to be r, the rank of H; 

SVD can be used in the former case, while subset selection 
followed by SVD can be used in the latter. 

In the present case, there are two basic problems: 

(1) determination of the rank of H, and 

(2) the solution of the LS estimation problem for the full 
or the truncated parameter- vector. 

Rank of a matrix 

Singular value decomposition (see Sec.7.6) provides the most 
direct and definite method for the determination of the rank 

of a matrix. SVD of an mxn matrix (m>n) H produces n 

singular values arranged in non-increasing order: 

S! a S 2 i ... 25 s r 2: ...£ s n & 0. 

Here, 

(i) the number (rsn) of nonzero, that is non-negligible, 
singular values will be indicative of the rank (r ) of the 
matrix; 

(ii) the smallest nonzero or non-negligible singular value 

gives the distance of the matrix H from the set of all 

(further) rank deficient matrices. 

It is implied that if the rank of H is r (<n), the 

singular values s r+1 ,...,s n are all zero or negligibly small 
compared with s r . So rank determination will require 
declaration of the tolerance 6, where 

s r > 5 2= s r+1 a ... 2 = s n 25 0. 

Remarks 

(a) If mxn matrix H is of rank r (r<min(m,n)), i.e. if H is 
noninvertible, its nxm pseudo-inverse is given by 

t t T 

H = VS U, 
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where mxn matrix S* = .diag(l/s 1( .... l/s r , 0,..., 0). If 
the rank of H = n, then H T = (H T H) -1 H T . If the rank of H 
- m = n, H = H 

(b) Rank of a matrix can also be determined using URV 
decomposition cited by Stewart (1992); this method is not 
explored in this book. 

LS estimation using SVD 

The estimation procedure can be summarized as follows: 

(1) Perform SVD of H and determine r (r<min(m,n)) the rank 
of H, with 

s t ^ s 2 ... fc s r and 

S r+1 s s r+2 a ... * S n a 0, r S n, 

where s r+1 ,...,s n are zero or negligibly small compared 
with s r . 

(2) Truncate U, S _1 and V to U, S -1 , and V of dimensions 
mxr, rxr and rxn respectively, and 

(3) Compute the LS estimate of the n-parameter vector 

8 = VS -1 U T y. (3.3.9) 

Remarks 

(a) SVD can provide the solution to the rank deficient LS 
estimation problem, which is optimum in minimum residual 
norm sense. Following (3.3.9), the minimum squared residual 
norm is given by 

II y - H#||i = £ Ci, c t = Uiy, 

1 =r +1 

where U = [u t u 2 ... u t ...u.], u t being m-column vectors. 
Although use of SVD ensures the algebraic validity of rank 
deficient LS solution, the meaningfulness of the estimates 
is not ensured. 

(b) An alternative expression for (3.3.9) is given by 

& = [ 0 ! 0 2 ...e n ] T = E (Cj/Sihri, 

1=1 

where Vj are the columns of the truncated V, that is the 
first r n-column vectors of V: 


V = [V! ...V! ...v r ...v n ], 


V = [V! ...Vi ...v r J. 
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LS estimation through subset selection 

In case of collinearity, the redundant columns of H should 
be eliminated before parameter estimation, otherwise 
erroneous parameters will be estimated even if SVD is used 
for solving the LS estimation problem. Subset selection 
(Sec.3.6.2 and 3.6.4) provides an easy method for the 
elimination of redundant regressors in the case of colli- 
nearity. Here, the selection of a subset of an information 
set (i.e. H) is discussed; subset selection in regression 
problems (i.e. with y taken into consideration) is discussed 
in Sec. 3. 6. 4. 

If H is rank deficient, SVD followed by QR with column 
pivoting (QRcp) factorization may be used to estimate an 
r- vector (r being the rank of H) LS estimate of the 
parameters. 

The basic idea is to selectively extract an mxr subset 
H from the mxn H through subset selection, discussed in 
Sec.3.6.2. Now the LS estimation problem is approximated as 
follows: 

y = H0 + e 

«H0 + e', (3.3.10) 

where H, which consists of r columns of H, is a full _rank 
matrix. The LS estimate of the reduced r-vector 0 is 
determined by solving (3.3.10) using SVD. 


Example 3.3.3 Compute the parameters of the model 
y(k) = a 1 x 1 (k) + a 2 x 2 (k) + a 3 x 3 (k) + a 4 x 4 (k), 
given the following data 


Table 3.3.3 Synthetic data 


y 

*1 

x 2 

x 3 

x 4 

1.2 

3.3 

0.4 

3.5 

2.0 

0.4 

4 . 5 

-1 . 5 

4.6 

1 . 5 

1.0 

5.4 

- 0.9 

5.6 

2.3 

1.3 

3.6 

0.3 

3.6 

2.0 

2.0 

5 . 1 

- 4.9 

5.3 

0.2 

0.5 

6.0 

-1 . 9 

6. 1 

2 . 1 

1.3 

2 . 7 

0.7 

2.8 

1 . 7 

2.4 

3 . 6 

- 5.0 

3.4 

- 0.8 
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Here the 8x4 data matrix H is given by the columns of x lf 
x 2 , x 3 , x 4 . SVD of H produces the singular values: 

Sj to s 4 : 18.5603, 6.7683, 0.2431, 0.0851 

So the rank of H is effectively 2. U and V are given by 

-0.2415-0.3184-0.3526 0.3666 0.2548-0.1152-0.3151 0.6357' 
-0.3552-0.0885 0.2039-0.7315 0.4059-0.3304-0.0838 0.0916 
-0.4150 -0.2563 -0.1128 -0. 1632 -0.8246 -0. 1840 -0.0943 -0.0040 
||= -0.2575-0.3252 0.3483 0.4546 0.1880-0.2977-0.2249-0.5704 
-0.4562 0.4761 -0.6308 0.0579 0.1638-0.0746 0.0963-0.3495 ’ 
-0.4708-0.1256 0.1708-0.0436 0.0801 0.8480-0.0697-0.0349 
-0.1893-0.3290 0.0155 0.1278 0.0843-0.0802 0.9035 0.0977 
-0.3338 0.6071 0.5222 0.2750-0.1209-0.1551 0.0620 0.3594 

' 0.0027 -0.5905 0.4622 -0.6616' 

-0.6696 -0.1525 0.5266 0.5012 
V = 0.2909 -0.7715 -0.3063 0.4758 ' 

-0.6834 -0.1813 -0.6444 -0.2911 

Exercise 1: 

(a) Using (3.3.8), the estimated parameters work out to be 

a x = 0.6888, a 2 = 1.1330, a 3 = 0.0409, a 4 = 0.8320. 

The estimated y is obtained as 

y = [1.2053 0.34 1.015 1.3027 -1.9888 0.4821 1.3529 -2.3809] 1 
and the MSE = 0.0009. 

(b) Since only two singular values are large, the parameters 
may be estimated using (3.3.9). The result is as follows. 

a t = 0.0600, a 2 = 0.5420, a 3 = 0.0792 a 4 - 0.3107. 

The estimated y is obtained as 

y = [1.3136 0.2876 0.9946 1.2853 -1.8677 0.4661 1.2915 -2.4732] 1 
and the MSE = 0.0062. 

Exercise 2 

Since only 2 singular values are large, subset selection may 
be used to select the two significant rows of H. So subset 
selection is performed on V , consisting of the 1st 2 
columns of V. QRcp factorization produces the permutation 
matrix 
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'00 1 O' 

10 0 0 
0 10 0’ 

0 0 0 1 

Hence the 2nd and the 3rd columns of H are selected (i.e. 
variables x 2 and x 3 are selected). So the problem can be 
solved using t ( 3.3.10), where H is an 8x2 matrix. SVD of 
H (= U'S'V' ) produces the singular values: 

Si = 13.8217, and s 2 = 5.2735. 

The parameter set, given by V' S' U' y, is determined as 

a 2 = 0.6957, and a 3 = 0.2920. 

The estimated y is obtained as 

y = [1.3001 0.2995 1.0089 1.2597 -1.8613 0.4592 1.3044 -2.4856] 1 
and the MSE = 0.0063. 

Remarks 

(1) In this simulation example, the data are generated so 
that x 3 is almost equal to x lt and x 4 is the approximate 
average of x 2 and x t . So the approximate rank of H, detected 
as 2, is correct. 

(2) Here, y js in fact noise corrupted y , where the 
elements of y are given by 

y = 0.3x t + 0.7x 2 . 

* 

It can be verified that if y = y , where 

y* = [1.27 0.30 0.99 1.29 -1.90 0.47 1.30 -2.42] T , 

the parameters work out to be a x = 0.3, a 2 = 0.7, a 3 = 0.0 
and a 4 = 0.0, irrespective of the values of x 3 and x 4 , 
provided the problem is not ill-conditioned. For example if 
one or 0 two singular values are negligibly small, even with 
y = y , incorrect parameters will result from (3.3.8); for 
correct estimation either (3.3.9) has to be used or subset 
selection has to be performed and then (3.3.10) has to be 
solved. The result with subset selection will be more 
meaningful as the model will not be overparameterized. 

(3) The subset selection in Exercise 2 above selects x 3 
instead of xj, because the energy Zx|(k) is greater than 
Zx^k). It is desirable that the columns are normalized 
before subset selection; in such a case the regressor vector 
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most strongly correlated with y should be selected first in 
QRcp factorization (see Sec.3.6.4 and Appendix 3B). 


3.3.4 Estimation with Orthogonalized Regressors 

An LS estimation problem: y = H0 + e, is ideally formulated, 
if the independent variables, that is the regressors given 
by the columns of H, are mutually orthogonal, but are not 
orthogonal to the vector of dependent variable y. In such 
cases, the estimated parameters are independent of the model 
order. 

Note that orthogonalization and orthogonal transf or- 
mation are discussed in detail in Chapter 7. 

Orthogonality 

Two signals z t (k) and Zj(k) are mutually orthogonal, if 
^gi(k)zj(k) - ( c ^ i = j t ’ c being a constant. 


Orthogonal polynomial regression 
Consider the nth order polynomial 


n 


y(k) = £ OjX^k) + e(k), 
1=1 


k = 1 m, 


(3.3.11) 


where x t are the independently appearing variables, y is the 
dependent variable and e is the error term. 

Using matrix notations 

y = X6 + e, (3.3.12) 


where 

y = 


y( 1 )‘ 
y(m) 


X = [xj. . Xi 


xj. 


Xl 


"x t ( 1 )" 

x i(m) 


0 = 



e( 1 )' 
e(m) 


No sequencing is assumed in y or H. Let the orthogonalized 
expression for the independent variables X be given by Z: 
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y = Z0 + e; (3.3.13) 

that is Z/3 = X0, where 

Zj(l) 

Z = [z v . Zj .. z n ], Zj — : , 0 = . 

. z i (N) J K 

Equation (3.3.13) can be restated as 

y(k) = Y. PiZi(k) + e(k), k = l,...,m. (3.3.14) 

l=i 

It is implied that the mutually orthogonal set of vectors 
(zj... Zj... Zjj) are produced from the linearly independent 

set of vectors, (x t .. Xj .. x n ) so that for p = 1 n, the 

set (z^.Zj .. Zp> spans the same p-dimensional subspace as 
the set {x*... x t ... x p ). The polynomials (3.3.14) are 

called orthogonal polynomials. The set, <Zj... Zj... z n ), 
constitutes a nonsingular linear transformation of the set, 
{x 1 ... x t ... x n >, which is the maximum number of linearly 
independent polynomials over the range k = 1 m. 

Implementation .using SVD 

T 

The SVD of an mxn matrix A is given by A = USV , where U 
= [u 1 ,..u 1 ,..,uj, V = [v 1 ,..v i ,..,v n ], and Vi being m 
and n vectors respectively; the diagonal matrix S contains 
the singular values: s 1 £s 2 i ...£s p £0, p = min(m,n) = n. 

A can be orthogonalized to Z as 

Z = AV = US, (3.3.15) 

Z = [z 1 ,..,z 1 ,..,z n ], 

where z t are m-column vectors of the mxn matrix Z. Thus 
A = USV T = ZV T 

= EuiO'iVi = E z l v i- (3.3.16) 

1 = 1 l = l 

Hence the LS estimation problem given by 
y = H0 + e 

= ZV T e + e = Z/3 + e, (3.3.17) 

where 3 = V T 0. The LS estimation of j3 is given by 

0 = S Vy. 


(3.3.18) 
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If H is not a full rank matrix, 0 can be estimated as 

0' = S _1 U T y. 

where S and U are the truncated rxn and mxr matrices 
respectively, r being the rank of H; 0' is an n-vector. 

Implementation using QR factorization 
The QR factorization of mxn matrix H is given by 
H - QR 

where mxn Q has orthonormal columns and R is upper 
triangular; R will be invertible if the columns of H are 
linearly independent, that is if H is a full rank matrix. 

y = H0 + e = QR9 + e 

= Qa + e, a = R0; 

the m-parameter vector a is estimated as 

T . T 

a = Q y, since Q Q = I. 

This approach requires H to be a full rank matrix. 

Applications 

Two direct applications of orthogonal LS estimation follow. 

(1) Synthesis of signals : Following (3.3.14) and (3.3.16 

-3.3.17), 

y = Y?A, (3.3.19) 

1 = 1 

where columns are orthogonalized. So given a set of 
orthogonal vectors (or polynomials) z t , the vector (or 

signal sequence) y may be synthesized with appropriate 

choices of 0. 

(2) Analysis of signals: Once the LS estimate $ is 

computed, it remains valid for any model order, that is for 
any chosen set of regressors z t in (3.3.19). In other wordj's, 

any number of terms z 1 0 i may be added to reconstruct y, 

without having to re-estimate 0, unlike the case of non- 
orthogonal LS estimation where for every change of the model 
order or <z t } sequence, the parameters 0 have to be 
re-estimated. 
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Remarks 

(a) One drawback of orthogonad LS estimates is that when the 
information matrix H is appended with new information to 
n^xn matrix H lt the earlier estimates 0 cannot be used to 
compute the corresponding y x . Hi has to be orthogonalized to 
Z t to use 0. Alternatively the earlier estimates 0 have to 
be retained as 0 = V*0, which cam be used directly with Hj. 

(b) The physical interpretation of the columns of H are lost 
in the orthogonalization to Z. 

(c) Different transformations are possible for the 
orthogonalization of H, and the transformed matrix Z is not 
unique. 

(d) Z together with V contains the full information in H. 


3.4 RECURSIVE LEAST SQUARES METHOD 

The least squares method discussed in the last section 
requires a block of data f or estimation, which is a 
disadvantage, because 

(a) at each sampling time or discrete time instant, as a new 
observation is available, the size of the data block will 
grow, and hence repeating the LS estimation procedure at 
each time instant with almost the same data (except for the 
new observation) will be largely redundant, 

(b) often the data are available sequentially only and hence 
sequential execution of the LS estimation algorithm is 
pref erable. 

At any time instant, given the parameter estimates 
(based on the past data) and the new set of observations, 
the recursive least squares (RLS) method produces the 
updated least squares estimates of the parameters. 


3.4.1 RLS Formulation 

Consider the process model (3.2.1). Given the data on the 
dependent and the independent variables from the time 
(k-m+1) to k, the process is described by (3.2.4) and the 
parameter estimates are given by (3.2.8): 

8(k) = [H T (k)H(k)] -1 H T (k), 

where the time index k is introduced to signify the least 
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squares estimate being based on the available data up to 
•time k. 

Now, at time (k+1), with an additional set of data 
available, the objective is to determine the updated 
parameters 0(k+l). Following (3.2.4), at time (k+1), the 
process is given by 

’y(k) ' 

y(k+l) 

where 

h(k+l) = [— y(k) -y(k-l)... -y(k-N) u(k) u(k-l)... u(k-N)] T . 
Similarly to (3.2.8), the least squares estimate is given by 


'fH(k) ' 

T l 

H(k) ' ' 

_1 rH(k) l T |y(k) ‘ 

l hT(k+1) . 


h T (k+l) : 

[h T (k+l)J |y(k+l) 


(3.4.2) 

Equation (3.4.2) cam be simplified to formulate the 
recursive least squares estimation law as follows. 

8(k+l) = 8(k) + k(k+l)(y(k+l) - h T (k+l)$(k)), (3.4.3) 

k(k+l) = P(k)h(k+l)(l+h T (k+l)P(k)h(k+l)) -1 , (3.4.4) 

P(k+1) = [I - k(k+l)h T (k+l)]P(k), (3.4.5) 

where k is the Kalman estimator gain and P is the covari- 
amce of the parameter-estimation error. 

Remarks : k(k+l) given by (3.4.4), cam be computed at time k 
itself, since h(k+l) consists of terms available at time k. 
Thus at time k+1, as soon as the new measurement y(k+l) is 
available, 0(k+l) can be produced. This is followed by 
the updating of P(k+1) and k(k+2). 

Derivation of RLS algorithm 

Rewriting (3.4.2), dropping most of the arguments and 
subscripts for the sake of clairity, 

£(k+l) = [H T H + hh T ]" 1 [H T y + hy] 

= [H T H] -1 H T y + [[H T H + hh T ] _1 - [H T H] -1 ]H T y 

+ [H T H + hh T ] -1 hy. 


H(k) 

h T (k+l) 


0(k+l) + 


[e(k) 

e(k+l) 


(3.4.1) 
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Since &(k) = [H T H] _1 H T y, 

8(k+l) = 8(k) + [H T H + hh T ] _1 tI - [H T H+ hh T ]tH T ] 

+ [H T H + 

= 8(k) + [H T H + hh T ]" I {-[hh T ][H T H]" 1 H T y + 
= 8(k) + [[H T H + hh T J -1 h](y - h T 8(k)), 
which is the same as (3.4.3) with 
k(k+l) = [H T H + hh T ] _1 h. 

The recursive expression for k is f ormulated as 
Consider the matrix inversion lemma (Appendix 1): 

[A + BCD]" 1 = A' 1 - A^BIC" 1 + DA" 1 B]~ 1 DA" 1 . 

Define the covariance matrix 

P(k) = [H T (k)H(k)] _1 . 

Following (3.4.6), 

k(k+l) = [(P(k))” 1 + hh T ] _1 h 

= [P(k+l)]h. 

Using (3.4.7) on the bracketed term in (3.4.8) 
k(k+l) = [P(k) - P(k)h(l+h T P(k)hf 1 h T P(k)]h 
= P(k)h(l - (l+h T P(k)h) _1 h T P(k)h). 

= P(k)h(l - (l+h T P(k)h)” 1 (l+h T P(k)h-l)). 

Hence 

k(k+l) = P(k)h(k+l)(l+h T (k+l)P(k)h(k+l))" 1 . 

Again from (3.4.9) and (3.4.10), 

P(k+1) = P(k) - P(k)h(l+h T P(k)h) _1 h T P(k). 

Hence following (3.4.11) 

P(k+1) = [I - k(k)h T (k+l)]P(k). 


3.4.2 Implementation Aspects 

The two prime concerns in the implementation 
estimation are (a) representativeness of the data, 
computational correctness. 


I rVy 

hh T J _1 hy 

hy] 

(3.4.6) 
f ollows. 

(3.4.7) 


(3.4.8) 

(3.4.9) 

(3.4.10) 


(3.4.11) 


of RLS 
and (b) 
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Adaptive prediction and control requires the estimator 
to be able to track the variations in the dynamics of the 
process. Hence it is necessary to give relatively more 
importance to the more recent data than the old data, which 
can be implemented through exponential f orgetting of the 
data in the parameter estimator. It is expected that the 
more recent data are rich in information. When the data lack 
information, i.e. when they remain relatively steady or 
unchanged, they should not be used for parameter estimation. 

Again, from a computational point of view the implemen- 
tation should be numerically well conditioned, and 
computationally robust, for which algorithms involving 
square-root updating of the covariance matrix are used. 

Exponential forgetting 

With exponential forgetting, the RLS estimation laws (3.4.3) 
to (3.4.5) become 

8(k+l) = &k) + k(k+l)(y(k+l) - h T (k+l)8(k)), 

k(k+l) = P(k+l)h(k+l) = P(k)h(k+l)(A+h T (k+l)P(k)h(k+l)f\ 

P(k+1) = i [I - k(k+l)h T (k+l)]P(k), (3.4.12) 

minimizing the cost criterion 

J(6,k) = £ A k_1 e 2 (i), e(i) - y(i) - h T (i)0, 

1=1 

where 0 < A s 1. When A = 1, the forgetting remains 
inactive, and all the data are equally weighted. When A < 1, 
the recent data are weighted more tham the older data. One 
meaningful way of specifying the forgetting factor A is 
through the asymptotic sample length (A sl ): 

A = 1 r— , A sl > 1. 

A sl 

A sl reflects the number of past samples on which to base the 
parameter estimation. For a very slowly time-varying 
process, typically A = 0.99; for time-varying processes with 
stochastic disturbances a higher A, say 0.95 s A < 0.99, is 
recommended. 

In RLS estimation, it is important that the estimation 
error covariance matrix P is well conditioned. When the 
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process remains relatively steady with little changes in the 
data, the data vector h(k+l) may tend to zero and hence 
(3.4.12) reduces to 

P(k+1) = P(k)/A. 

For A < 1, an exponential growth of P(k) will result, and 
will cause undesirably large changes in the parameters 0, 
when the data vector again becomes nonzero which may even be 
due to noise. This phenomenon is known as covariance windup. 
The remedy is to ensure that P stays bounded, f or which 
there are various approaches. One wav is to stop updating 
the parameters when (y(k+l)-h T (k+l)0(k)), the prediction 
error, falls too low. Alternatively, the forgetting cam be 
inhibited when the information in the data is low. 

Numerical stability and robustness 

The covariamce updation (3.4.5) is numerically ill condi- 
tioned; it is sensitive to computer round-off errors and the 
differencing operation (between positive terms) in (3.4.5) 
leads to degradation of computational accuracy. 

The numerical stability cam be significantly improved 
by propagating the covariance matrix in the square root 
form. The basic principle is to factorize the covariance 
matrix P into RR , where R is the square root of P, and to 
update R at every recursion. The alternative is to use UDU 

factorization due to Bierman (1977); the covariance matrix P 

T 

is factorized as UDU , where D is a diagonal matrix, and U 
is an upper triangular matrix with Is on the diagonal. UD 
is the square root of P. D and U are propagated through the 
recursions instead of P. Besides the numerical stability, 
the advantages of UDU factorization are that no explicit 
square-root extractions are necessary. Sequential 
propagation of the covariamce matrix through U-D 
factorization is referred to as U-D covariance measurement 
updation which is discussed in Appendix 3. 


3.5 SOME SELECTED METHODS: AN INTRODUCTION 

Although the least squares method has attractive convergence 
and asymptotic properties, its main weakness is that the 
estimates will be biased, if the noise is correlated with 
measurements of the dependent variable. There are many 
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alternative methods of estimation, a detailed study of which 
is beyond the scope of this book; this section introduces 
three such methods belonging to three different classes of 
parameter estimators. 


3.5.1 Instrumental Variable Method 

In this method, the parameters are estimated using a set of 
variables called instruments or instrumental variables, 
which are correlated with the regression variables but are 
Uncorrelated with the noise. Consider the process model 

y(k) + a 1 y(k-l) +...+ a„y(k-N) 

- bju(k-l) +...+ b M u(k-N) + e(k), (3.5.1) 

where y is the measured process output and u is the measured 
process input and e is the error or noise. Restating 
(3.2.3), the process model (3.5.1) is given by 

y(k) = h T (k)0 + e(k), (3.5.2) 

where h, the n-data vector, and 0, the n-parameter vector, 
are given by 

h(k) = [y(k-l) ... y(k-N) u(k-l) ... u(k-N)) T , and 

0 = [— a^ ... — &k bj ... bjj] 

respectively; the noise is not assumed to be uncorrelated 
with y. 

Introduce vector w(k) consisting of the instrumental 
variables 

w(k) = [-x(k-l) -x(k-2) ... -x(k-N) u(k-l) ... u(k-N)] T , 

(3.5.3) 

where x(k-l), x(k-2) etc. are the instruments or instru- 

mental variables, which are chosen so that w(k) is 
uncorrelated with (e(k)> but is strongly correlated with 
h(k). Following (3.5.3) and (3.5.2), 

w(k)y(k) = w(k)h T (k)0 + w(k)e(k), (3.5.4) 

using measurements over m samples, 

y = H0 + e, (3.5.5) 

W T y = W T H0 + W T e 


or 


(3.5.6) 
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T T 

where W is an mxn matrix with rows of w (k-m+1) w (k) 

and H is an mxn matrix with rows of h T (k-m+l),..., h (k), 
and y and e are m-vectors. 

So the instrumental variable estimation follows as 

£ = (W T H) _1 W T y, (3.5.7) 

since by hypothesis £[W e] = 0, and £[W H] is positive defi- 
nite and hence invertible. 

The similarity of (3.5.7) with the LS solution leads to 
the recursive f ormulation of the instrumental variable 
method as 

$(k) = 8(k-l) + k(k)(y(k) - h T (k$(k-l)), (3.5.8) 

where 

k(k) = P(k-l)w(k)(l + h T (k)P(k-l)w(k)) -1 (3.5.9) 

and 

P(k) = [W T H1 _1 = [ £ w(i)h T (i)] _1 
1=1 

= P(k-l) - P(k-l)w(k)(l+h T (k)P(k-l)w(k)) _1 h T (k)P(k-l). 

(3.5.10) 

Unlike the recursive least squares case, P is not a symme- 
tric matrix here. 

Choice of instruments 

Different choices are possible for the instruments. One of 
the direct ways of generating the instruments is to derive 
x(k) from 

A(q -1 )x(k) = &(q'*)u(k), 

where A(q *) and §(q -1 ) are the estimated polynomials: 

A(q *) = 1 + ajq -1 + ... + a„(q’ N ), 

&(q *) = ^q 1 + ... + {j N (q” N ). 

The parameters of ft(q _1 ) and 6(q -1 ) polynomials may be 
generated using any sensible method of estimation. One 
obvious choice can be to use the RLS estimator, in which 
case the instrumental variable vector will be given by 

w(k) = H T (k$(k). 

A relatively simpler choice can be to assume &(q 1 )/^.(q *) 
introduces a pure lag between u(k) and x(k), leading to 
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w(k) = (-u(k-t) -u(k-r-l)... -u(k-t-N+l)... u(k-l)... u(k-N)] T . 

It is expected that the correlation between w(k) and h(k) 
will be rendered maximum for a certain value of t. 

The quality of estimation will depend on the way the 
instruments are generated. 

For further discussions on the instrumental variable 
method and its applications refer to Young (1984), and 
Soderstrom and Stoica (1989). 


3.5.2 Maximum Likelihood Method 

The maximum likelihood method is a powerful and versatile 
off-line method, which uses the knowledge of the statistical 
distribution of the observations. 

In any estimation problem, there are two basic 
entities: the data (which may be noisy) and the parameters. 
The objective of parameter estimation is to obtain the set 
of parameter values that best conforms to the data. In other 
words the most likely set of parameters with ref erence to 
the data is to be determined; the validity of the parameters 
is obviously linked with the most likely nature of the data, 
that is the most probable distribution of the data. If the 
parameter set is given, and a set of observations is 
available, the probability density function may be defined 
for the observations. The problem in the case of parameter 
estimation is the reverse. Given a probability density 
function for the distribution of the data, the parameter set 
may be computed through the maximization of a likelihood 
function of the parameters with respect to the observations; 
this is the principle of maximum likelihood estimation. The 
method requires the joint probability density function for 
the observation being predefined, but it is not restricted 
to any particular form of the density function. 

To estimate the parameters, Q it 0 2 0„, on the 

basis of the given m observations x lt x 2 ,.., x^, a 

likelihood function L(x,0) is introduced. 

m 

i-(x 1 ,x 2 ,...,x m | 0 lf 0 2 ,...,0 n ) — n i f(x 1 |0 1 ,0 2 ,...,0 n ), 

where f(x 1 j0 1 ,0 2 0 n ), i = 1 to m, is the joint 

probability density function for the observations: x t . Let x 
and 0 stand for the observation vector and the parameter 
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vector respectively. The value of 0 1( for i = 1 to n, which 
maximizes the likelihood function L is taken as the estimate 
0j. Thus the estimate 0 is the solution of 

dL _ n , 

ae l ~ °* 1 “ 

It is assumed that the joint probability distribution of the 
possible observations is specified. If there are n parameter 
values, e 1 ,0 2 ,...,0 n , that describe the relationship among 
the observations, and if m number of observations y = y t , 
y 2 ,..., y m are drawn from the specified distribution, let 
the joint probability density function be denoted by 


/(x, 0 ) = /(y t ,y 2 y m |0i,e 2 0 n )* 


(3.5.11) 


This is a deterministic function of 0, once the observations 
are given. It is assumed that the observations are exact 
although they may not be free from noise contaminations. The 
function (3.5.11) is called the likelihood function, since 
it is the measure of the likelihood of the observations 
being valid subject to the choice of the parameters 0. In 
other words, the estimates of the parameters will be those 
which maximize the likelihood of the observations y, that is 
which maximize the likelihood function /(y|0). 

So the maximum likelihood estimates are obtained as the 
solutions of 


££=o^=o 

a©! ’ 30 2 ’ 


V.o 

S0 n 


If y t y„ are independent observations, 

/(y|e) = /(yi|e)/(y 2 |0).../(y N |0). 


Therefore, for mathematical simplification, the logarithmic 
transformation can be introduced; since a logarithm is a 
monotonic function of its argument, the value of 0 that 
maximizes /(x|0) also maximizes log /(x|0). Hence the 
maximum likelihood estimates 0 can be obtained by solving 


aiog /(y|0) 
a0 t 


(3.5.12) 


for i = 1 to m. 
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Remark : The probabilistic characterization of the data may 
not always be possible in practice, because of the 
availability of limited data and little control on the 
generation of the data. 


3.5.3 The Koopmans-Levin Method: Implemented using SVD 

The estimation methods discussed earlier assume the 
measurements of the independent variables being noise free. 
The Koopmans-Levin (KL) method permits noise to be present 
with both the dependent as well as the independent 
variables; such classes of LS problems are referred to as 
total least squares problem. This section studies the KL 
method of parameter estimation, implemented using the 
singular value decomposition (Fernando and Nicholson, 1985); 
this method can perform like an approximate maximum 
likelihood estimator. 

Basic principle 

Consider an ARMA model for the process: 

A(q -1 )y(k) = B(q _1 )u(k-1), (3.5.13) 

where 

-1, , -1 -N 

A(q ) = 1 + a t q + ... + a N q , 

B(q _1 ) = b 0 + bjq -1 + ... + b„q~ N . 

The variables y and u are available as contaminated 
measurements y' and u' respectively, where 

y' (k) = y(k) + e^k), and u'(k) = u(k) + e 2 (k); 

It is assumed that the noise sequences <e 1 (k)> and (e 2 (k)) 
are independent, zero-mean, white noise sequences with known 
statistical characteristics: 

£<ej(k)> = 0, £<e 2 (k)> = 0. 

E<e,<k)e 1 (j» 

E<e 2 (k)e 2 (j)> » (o^ ; t ; j 
£{e 1 (k)e 2 (j)> = 0 for all k.j. 



3.5 Some Selected Methods: An Introduction 89 


Define the n-parameter vector 

0 = [— 1 -aj ... — b 0 bj ... b N ] . (3.5.14) 

Define the true input-output vector g(k) and the observed 
input-output vector h(k) as follows: 

g(k) = ly(k) y(k-l)... y(k-N) u(k-l) u(k-2)... u(k-N-l)] T , 


(3.5.15) 

h(k) = [y' (k) y' (k— 1) ... y'(k-N) u' (k— 1) ... u' (k-N-l)] T , 


(3.5.16) 

with 

h(k) = g(k) + [e t (k) e 2 (k)] T , 

where e t and e 2 are the noise vectors. Following (3.5.13 - 
3.5.15), g (k)0 = 0. Define the covariance of h(k) as 


R(k) = £ h(k)h T (k). 

k=l 


(3.5.17) 


So 

E(h(k)h T (k)> = lim ±R(k). 

m — m 

Again 


E(h(k)h T (k)> - £(g(k)g T (k)> + 


'Eie^kJeJfk)) £{ ei ( k)e 2 (k))' 
£<e 2 (k)el(k)) £{e 2 (k)e 2 (k)> 


Hence 


E(h(k)h (k)}0 = £(g(k)g (k)>0 + 

= [<& o 


Kl 0 ' 

v\l 


(3.5.18) 


o*2l 


0 , 


since g (k)0 = 0. If <Tj = «r 2 = or 2 , that is the input and the 
output noise variances in (3.5.18) are equal (when they are 
not equal, with appropriate scaling they can be made equal), 

£{h(k)h T (k)}0 = cr 2 0; 


so, the parameter vector 0 is an eigenvector of the nxn 
matrix £{h(k)h T (k)). The parameter estimation follows 
through the following arguments: 
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(i) If the noise variance is significantly small compared 

with the smallest eigenvalue of E<g(k)g T (k)> in 

(3.5.18), £<h(k)n(k)) will have one significantly 

small eigenvalue. 

(ii) If the noise variance is small, and if E(h(k)h T (k)> has 

one significantly small eigenvalue, 0 will correspond 
to the eigenvector associated with the smallest 

eigenvalue of E(h(k)h T (k)>. 

Estimation using SVD 

The nxn covariance matrix 

R(k) - H T (k)H(k), 

where the mxn matrix 

H(k) = (h(k-m+l) h(k-m+2) ... h(k)] T , 

n being the length of the parameter vector 0. Consider the 
singular value decomposition of H(k): 

H(k) = USV T , (3.5.19) 

T T 

where U and V are orthogonal matrices: U U ■ I, V V = I, 
and S = diag [s lf s 2 , ... , Sp], is the diagonal matrix 
with p = min(m,n) = n, as m»n. Hence 

R(k) = VS 2 V T . (3.5.20) 

So, the smallest eigenvalue of R(k) is given by the smallest 
diagonal element in S : s 2 , and the corresponding 
eigenvector is the last column of V. In other words, the 
parameter vector 0 is given by the last column of V, 
obtained from the SVD of H(k), and hence R(k) need not be 
explicitly formed. 


Example 3.5.3 Given the noisy input and output data, 
estimate the parameters of the simulated process 

A(q 1 )y(k) = B(q ^utk-l) + e(k), 

where 

A(q -1 ) = 1 + a 1 q" 1 + a 2 q 2 = 1 - 1.5q *+ 0.7q 2 , 

B(q -1 ) = b 0 + bjq -1 = 1 + 0.5q"\ 

and u(k) is generated as (l-0.9q’ 1 )" 1 e(k), where e(k) is 
Gaussian white noise with unit variance. 
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Table 3.5.3 Estimated parameter values 


Exercise 


a 2 

b 0 

bi 

1 

-1.536 

0.7147 

1.064 

0.1891 

2 

-1.585 

0.7492 

0.847 

0.3182 

3 

-1.521 

0.7444 

1.176 

0.5076 

4 

-1.468 

0.6767 

0.989 

0.6151 

5 

-1.525 

0.71 17 

1.078 

0.3237 

6 

-1.533 

0.7389 

0.726 

0.7200 

7 

-1.461 

0.6561 

1.206 

0.3101 

S 

-1.523 

0.7050 

1.074 

0.3584 

9 

-1.524 

0.7265 

0.896 

0.5912 

10 

-1.596 

0.7716 

1.606 

0.2736 

Average 

-1.527 

0.7195 

1.0662 

0.4207 

Std. dev. 

0.040 

0.0328 

0.2285 

0.1659 

True value 

-1.500 

0.7000 

1.0000 

0.5000 


Here, 

y(k) + a 1 y(k-l) + a 2 y(k-2) - b 0 u(k-l) - bjutk-2) = e(k). 

The values of the variables on LHS constitute the columns of 
H. Assuming 200 sets of data being available, H is 200x5 
matrix. SVD of H is produced and the last column of 6x6 
matrix V is noted; the column vector normalized by the 
first column element gives the parameter values. 

The exercise is repeated 10 times with different sets 
of the data. Irrespective of large additive input and the 
output noise, the results obtained are reasonably close to 
the true parameter values as shown in Table 3.5.3. 

Remark : This example is taken from Fernando and Nicholson 
(1985). 

Recursive estimation 
Define the information matrix 
P(k) = R -1 (k). 

Hence using (3.5.17) 

P _1 (k) = P -1 (k-l) + h(k)h T (k). (3.5.21) 

Again following (3.5.20) 

P(k) = VS'V. 

So the lowest eigenvalue of R(k) is the same as the largest 
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2 

eigenvalue of P(k) which will be l/s n , and the corresponding 
column of V will give the parameter vector. Let 

P(k) = C T (k)C(k), 

and the SVD of C(k) is given by C(k) = U c S c Vc, 

C T (k)C(k) - V c S^vJ 

= P(k) = VS' V. 

Since the eigenvalues of a matrix are unique, the z diagonal 
elements of will be the same as those of S ; so the 
parameter vector 0 will be given by the first column of V c 
which corresponds to the first and the largest diagonal 
element in S^. 

Noting the similarity between (3.5.21) and the 
equations (3.4.8-3.4.9), the sequential update of C(k) can 
be formulated through the U-D covariance measurement update 
algorithm discussed in Appendix 3A. 

Summary 

(1) Recursively update C(k) with the availability of a new 
data set h(k) (see Example 3A3, in Appendix 3A). 

(2) Compute the SVD: C(k) = U c (k)S c (k)V*(k). 

(3) The parameter vector will be given by the first column 
of V c (k) corresponding to the largest singular value in 

S c - 

Remarks 

(a) The identification method studied here, requires the 
smallest singular value of H(k) to be distinctly small. More 
than one singular value of H(k) being nearly equally small 
in (3.5.19) causes ambiguity, and hence is undesirable. This 
can happen, if the noise is not small or if H(k) is over- 
parameterized, in which case reparameterization will be 
required. 

(b) The KL method can work with noise corrupted outputs as 
well as inputs. The noise may not be Gaussian in nature. 


3.6 MODEL SELECTION AND VALIDATION 

Before the parameters are estimated, the model structure and 
size have to be specified. Care should be taken in the 
choice of the model order and in the selection of the 
specific variables to be incorporated in the model. An 
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overparameterized model is expected to overfit the data; 
this is because if there is noise associated with the data, 
an overparameterized model tends to model extraneous noise as 
well as the information in the data. Such a model lacks 
in representativeness and shows poor validation against sets 
of data not used for modelling. 

This section starts with a discussion on Akaike 

Inf ormation Criterion, a popular method f or model order 
assessment. Subset selection is studied next. ‘Subset 
selection’ is a generic term, meaning selection of specific 

variables in the candidate set. The procedure for subset 

selection is different for different problems. Here, first 
subset selection based on the conventional QR with column 
pivoting (QRcp) factorization is discussed, which is 
followed by a case study on best subset AR modelling. Next a 
modified QRcp f actorization scheme is presented f or subset 

selection in a regression problem. 


3.6.1 Akaike Information Criterion (AIC) 

In identification, any information criterion used for 
selection of the optimal model consists of two components: 

(i) a measure of the best model fit and 

(ii) a penalty measure on the number of model variables. 

Akaike Information Criterion (Akaike, 1974) can be stated as 
follows. When a model with q independently adjusted para- 
meters is fitted to the data, the AIC of the set S q is 
defined as 

AIC(Sq) = log e (£*) + qy, (3.6.1) 

where y = 2 and af is the variance of the residual or the 
model-fitting error; the model with minimum value of AIC is 
selected. It is expected that the parameter estimation is 
based on maximization of the information entropy, as in case 
of maximum likelihood estimator. For linear regression AIC 
reduces to 

AIC(Sq) = N log^k)] + 2q, 

where N is the number of data inputs, and e is the estima- 
tion error: e(k) = y(k) - y(k), y(k) being the estimate of 
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y(k) based on a model with q parameters. So candidate models 
of diff erent possible regressor variables are considered and 
estimated; the model with lowest AIC is selected. AIC may 
be used with stepwise regression; in stepwise regression 
(which are also called nested class of models), the model 
complexity is increased in steps, and AIC may be used for 
finding the best model. One disadvantage is that there may 
be many candidate combinations; subset selection can be 
useful in such cases as discussed in Sec. 3. 6. 3. 

Usually AIC tends to overestimate the model order. It 
was proposed (Bhansali and Downham, 1977) that y in (3.6.1) 
may be increased up to 5 to penalize over-parameterization 
more stringently. There have been many other propositions, 
e.g., Schwarz (1978) proposed y = qrlog e (N-g). Model order 
selection criteria are also discussed in Parzen (1974) and 
Shibata (1985). All the model order selection criteria have 
a certain degree of inherent subjectiveness, and no particu- 
lar criterion can be said to be the best. 


3.6.2 Subset Selection from an Information Set 

Given any mxn information set A with man, the objective is 
to select an mxg subset Aj (g<n) of A, which contains the 
salient part of the information contained in A. In the 
regression context, A is the same as H in (3.2.4), where the 
objective is to select the g significant variables out of 
the n variables; m indicates the length of the data sets. 
SVD followed by QRcp factorization has been used for subset 
selection. 

Selection procedure 

Let SVD of A be given by A = USV T , where U = u,,], V 

= [vj,...,v n ], and S = [diagis! s p >:0], p = min(m,n). U 

and V are the left and the right singular vector matrices 
respectively. The left and the right singular vectors form a 
basis for the column-space and the row-space of A 
respectively. Again 

p T 

A “ EUiStVi- 

If g of the p singular values of A are dominant, that is 
s q+1 , s q+2 Sp are insignificantly small, the prime 
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information of A will be contained in 

- 9 j 

A = E u i s l v i- 
1=1 

Again 

rank(A) = the number of nonzero singular values. 

So, a selection of A t , the prime mxg subset of A, should 
correspond to the set of singular values (s 1( ...,Sg), g s p, 
implying 

rankiAj) = pseudorank(A) = g. 

QRcp f actorization can be used f or the selection of the 
subset Aj as follows. 

Let V consist of the first g columns of V, that is 

V = [v t v 2 ... V g ], and let V T = [V t V 2 ] T , 

where V t is a gxg and V 2 is an (n-g)xg matrix. QRcp 
factorization, performed on V, will produce the nxn 
permutation matrix P, where 

Q T tvI V 2 ]P = lR n R 12 ], 

such that R tl is upper triangular and Q is a matrix with 
orthonormal columns. Define matrix A x as 

[A t A 2 ] = AP, 

where Aj is an mxg matrix and A 2 is an mx(n-g) matrix; A a 
will have the g prime columns of A arranged sequentially in 
order of decreasing importance (starting from the first 
column). Thus a dominant subset of A is selected. 

Remarks 

(a) Subset selection will be unique if the (p-q) singular 
values of A are zero. Precise selection of subsets requires 
a large gap or jump in the distribution of the singular 
values (i.e. Sj»s 1+1 , where lsisp), otherwise it may not 
provide sufficient information. 

(c) The QRcp factorization on V for subset selection is 
more robust than performing the same on A. 

Principle of selection 

The present selection of columns through QRcp factorization 
is based on the Euclidean norm. First the column with 
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maximum Euclidean length is selected. Next the column having 
maximum orthogonal component to the selected column is 
selected and so on. So the i-th selected column is the one 
having maximal orthogonal component to the subspace spanned 
by the earlier selected i-1 columns. The sequence of the 
selections is stored in the permutation matrix P. 

The mechanism of pivoting of columns within QRcp 
factorization is explained in Appendix 3B. 


3.6.3 Case Study: Best Subset-AR Modelling using 
Information Criterion and Subset Selection 

For optimal modelling of any time series, both the number of 
variables and the specific variables within the model have 
to be optimally chosen. Any additional term in the model may 
permit the model to represent noise, uncharacteristic of the 
process, along with the actual underlying process which is 
undesirable. Suppose a time series <y(. )> can be expressed 
by a full-set AR model of maximal order n. The objective is 
to identify the best (in terms of minimum AIC or some such 
information criterion) subset-AR model of order r (<n). 

The usual procedure is to find AIC exhaustively for all 
possible models (with all possible combinations of n 
candidate regressors); the model for which AIC is minimum is 
considered to be the best subset-AR model. So, for n 
regressors, (2 n -l) different models have to be considered, 
which can be computationally expensive. Use of subset 
selection along with AIC can greatly reduce the domain of 
exhaustive search for the best subset-AR model. 

The problem 

For any stationary time series <y(. )>, the highest lag n 
for which the partial autocorrelation function of the 
stationary series is significant is considered as the 
maximal order of the linear full-set AR model. Let <y(.)> be 
modelled as 

y(k) = — a 1 y(k— 1) - a^Oc-Z) -...- a„y(k-n) + e(k), 

(3.6.2) 

where e is the noise or uncertainty. So following (3.2.1 - 
3.2.4) the estimation problem can be expressed as 

y = AG + e, 


(3.6.3) 
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where 

y = Iy(n+1) y(n+2) ...y(m+l)3 T , 


'y(n) y(n-l) 
y(n+l) y(n) 


[y(m) y(m-l) 

0 = [-a t -a 2 ... -aj 1 , 


y(l) 

y(2) 


y(m-n+l 


m>n, 


and e is the respective noise vector. Here <y(k)> can be a 
part of any larger series. The existence of a representative 
subset-AR model of order r (<n) will show up in one or more 
of the n singular values of A being relatively small. 

There are two basic issues in the present 
identification problem: 

(i) The linearly dependent columns of A should be 

eliminated, 

(ii) Only appropriate columns of A should constitute the 
linear model (3.6.3); here the appropriateness is decided 
based on the minimum value of the information criterion. 


Information criteria used 


The present study uses two different information criteria: 
AIC and SIC, the latter being the Schwarz Information 
criterion (Schwarz, 1978). The following normalized forms 
are used. For any set S r , with r number of independent 
variables, 

AIC(S r ) = - + and 

SIC(S r ) = rlog e (N-r)/(N-r) + log Jo?, 


where N is the number of stationary observations, r is the 
order of the full model, and (N-r) is the effective number 
of observations for fitting the model; <r„, the estimated 
noise variance, is given by 


A2 


- 1 f A2 

M_r e l» 

N r i=r+l 


where e t are the residuals. 
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Modelling using subset selection and AIC (or SIC) 

The identification of the best subset- AR model can be 
performed through the following three steps. 

(l)(a) Perform SVD of mxn matrix A (A = USV T ). Choose the 
possible pseudoranks of A from the magnitude of the 
singular values. 

(1) (b) For each pseudorank g (2 s g s n-1), perform QRcp 

factorization on the gxn matrix V 1 and select the set 
of relatively independent regressor variables of 

size g; so the subset S g of regressors corresponding 
to each pseudorank g is defined. 

(2) For each subset- AR model (corresponding to each 

pseudorank g), AIC (or SIC) is computed. The subset 
S r , corresponding to pseudorank r for which AIC 

(or SIC) attains the minimum value, is selected; the 

r regressors of S r are the candidates for the desired 
best subset- AR model. 

(3) For the (2 r -l) models, with all possible combina- 

tions of regressors, AIC (or SIC) is determined. The 
model producing the minimum value for AIC is the best 
subset- AR model. 

Remark: The consequent reduction of the exhaustive search 
space for the best subset-AR model is (2 n -l) to (2 r -l), 

which is computationally advantageous as r can be much 
smaller than n. 


Example 3.6.3(1) Subset-AR model for the sunspot series 

The yearly averaged series of the sunspot numbers (Appendix 
8A) over the years 1700-1920 are considered for this study. 
Since the partial autocorrelation is high for the highest 
lag 9, the full AR model is considered to be of order 9. So 
the AR model is given by (3.6.2) with n=9. Using the 
available data, a 213x9 matrix A is formed. SVD of A 
produces the singular values (s x to s 9 ): 

2053.70654 960.93292 772.59265 293.06665 211.43785 

130.61121 98.31714 85.66112 78.89665. 

For each pseudorank 8 to 2, subset selection of A is 
performed. The parameters of the AR model corresponding to 
each subset is now estimated and the AIC and the SIC are 
computed. The results are shown in Table 3. 6. 3(1). 
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Table 3.6.3(1) Subset selection and AIC/SIC for Sunspot series 


pseudo- 
rank g 

Set of regressors 
(lag of regressor 

S(g) 

variables) 

AIC 

SIC 

8 

6 

9 

5 

7 

8 

4 

2 3 

6.29589 

6.43886 

7 

9 

1 

8 

2 

5 

4 

7 

5.37993 

5.50702 

6 

9 

1 

6 

4 

8 

3 


5.44842 

5.55962 

5 

9 

1 

5 

3 

7 



5.45650 

5.55181 

4 

9 

1 

6 

4 




5.57438 

5.65381 

3 

9 

1 

2 





5.34758 

5.41091 

2 

8 

2 






6.79878 

6.84628 


Since both AIC and SIC are minimum corresponding to the 
pseudorank 3, an exhaustive search is made in this set to 
obtain the best subset model, which is found to be 

y(k) = 1.2495y(k-l) - 0.551y(k-2) + 0.15y(k-9) + e(k), 

(3.6.4) 

with <r e = 203.261. Thus in this particular example, the 
present method directly selects the best subset-AR model. 


Example 3.6.3(2) Subset-AR model for German unemployment 
series 

365 monthly observations (Jan. 1948 to May 1977) of this 
series (Appendix 20 are used for modelling. The partial 
autocorrelation f unction shows the maximal order to be 19; 
for the present exercise the maximal order is overestimated 
as 20. The series (y(k)> is transformed to a relatively 
stationarity series (z(k)>, where z(k) = (l-q _1 )(l-q lz )y(k), 
and (z(k)> is modelled. Here A is a 332x20 matrix. The 
singular values of A are as follows (s 1 to s 20 ): 

(0.32460 0.32016 0.30310 0.28957 0.25989 0.24250 

0.21911 0.21372 0.20576 0.20087 0.16128 0.15731 

0.15257 0.14785 0.14571 0.14445 0.13577 0.13246 

0.12687 0.12464) X 10 7 

The subsequent subset selection and computation of AIC and 
SIC values are shown in Table 3. 6. 3(2). 

The minimum AIC is obtained f or the pseudorank 12, 
wheras the minimum SIC is obtained for pseudorank 9. 

With pseudorank 12, the best subset AR model is 

obtained as 
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Table 3. 6. 3(2) Subset selection for pseudoranks g, and AIC 

and SIC for German unemployment series 


g Column numbers AIC SIC 


19 

11 

10 

12 

9 

2 

19 

3 

13 

20 

4 

17 

15 

16 

14 

7 

8 

1 

18 

5 

22. 9141 

23. 1433 

18 

11 

10 

12 

9 

2 

13 

14 

19 

4 

16 

15 

1 

6 

7 

20 

3 

8 

18 


22. 9129 

23. 1306 

17 

11 

10 

12 

9 

2 

16 

13 

8 

4 

15 

14 

7 

6 

20 

19 

1 

17 



22.9060 

23. 1123 

16 

10 

11 

12 

9 

2 

15 

6 

7 

14 

18 

17 

16 

13 

8 

20 

4 




22. 9088 

23. 1036 

15 

10 

11 

12 

9 

2 

6 

15 

8 

4 

18 

1 

16 

3 

17 

13 





22. 8955 

23.07 8 9 

14 

10 

11 

12 

9 

7 

2 

5 

16 

4 

17 

15 

6 

18 

8 






22. 9006 

23.0725 

13 

10 

11 

9 

12 

2 

5 

6 

16 

3 

15 

4 

14 

18 







22. 8973 

23.0577 

12 

10 

11 

9 

12 

2 

4 

3 

8 

19 

17 

1 

14 








22.8918 

23.04 08 

11 

11 

10 

12 

9 

Z 

3 

4 

8 

7 

19 

18 









22.8991 

23.0366 

10 

10 

11 

9 

12 

2 

4 

5 

3 

8 

18 










22. 8939 

23.02 00 

9 

11 

10 

2 

12 

7 

5 

8 

13 20 











22.8966 

23.0112 

8 

11 

2 

10 

1 

7 

5 

6 

4 












23. 0579 

23. 1610 

7 

2 

11 

10 

1 

5 

15 

14 













23.0473 

23. 1390 

6 

10 

1 

11 

9 

2 

17 














23.0419 

23. 1222 

5 

1 

20 

9 

2 

12 














22. 9991 

23.0679 


z(k) = -0.09696z(k-l) - 0.13641z(k-2) + 0.07677z(k-9) 

- 0.30327z(k-ll) - 0.37410z(k-12) + e(k), 

(3.6.5) 

with m 0.818309x10 10 , AIC = 22.8615 and SIC = 22.9302. 

With pseudorank 9, exhaustive search produces the best 
subset-AR model as 

z(k) = -0. 11640z(k-2) - 0.30393z(k-ll) - 0.40624z(k-12) 

+ e(k), (3.6.6) 

with <rl m 0.835744x10 10 , AIC = 22.8705, and SIC - 22.9164. 

The one-step forecast variance for the two models 
(3.6.5) and (3.6.6) work out to be 0.499007xl0 9 and 
0.469641x1(3 respectively which are very close, although the 
domain for exhaustive search for the SIC based model is much 
smaller in this case. 

Remarks 

(a) It is found (Sarkar and Kanjilal, 1995) that given the 
maximal order, this method leads to the optimal subset-AR 
model in terms of the minimum value of the concerned 
information criterion. 

(b) The SIC based approach is f ound to lead to smaller 
search space for the best subset-AR model than the AIC based 
approach. 
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(c) In the search for S r , the minimum value of the informa- 
tion criterion (IC) may not be taken literally, since no 
information criterion is absolutely perfect. In the present 
context, if the IC for a certain subset with lower 
pseudorank is close to the minimum IC corresponding to a 
larger pseudorank, the optimal or the near optimal set may 
lie within the former set. However, the optimal set always 
lies within the set S r , for which IC is minimum. 


3.6.4 Linear Regression through Subset Selection 

In Sec. 3. 6. 2, the selection of a subset from an information 
set has been discussed. The problem in using this scheme for 
selection of the prime independent variables in linear 
regression is that only the candidate regressors are 
considered here, while their relationships with the output 
vector y cannot be taken into account. To use this subset 
selection scheme f or selection of best set of regressors, 
some additional procedure (like the use of the information 
criterion as in Sec.3.6.3) is required to take y into 
consideration. The subset selection scheme is directly 
applicable f or selection of regressors only where the 
information or data matrix shows a distinct jump in the 
distribution of its singular values, as in case of Example 
3.3.3. 

QR factorization incorporating a modified column 

pivoting scheme for direct successive selection of the most 
significant regressor variables in order of maximal (mutual) 
independence as well as correlation with the output vector 
is presented in this section. Consider the linear modelling 

problem: 

y = b^ + ... + b^! + ... + b n x n , (3.6.7) 

where y is the output and X! to x n are the regressor 

variables. Expressing in concise matrix notation for m sets 
of data 

y = A0, (3.6.8) 

where A = (a 1 ,..., a t , ...a,,] is the mxn data matrix 

containing m vectors of n regressors aj, and 0 is the 
n-parameter vector. The objective is to select r (<n) most 

significant variables, leading to the model 
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y = A 0 + e, (3.6.9) 

where the mxr matrix A is a subset of A and 0 is the 
corresponding least squares (LS) estimated r-parameter 
vector; e stands for the modelling error or noise. It is 
assumed that m>n but limited sets of data are available. 

Conventional methods 

The desired selection of regressors can be perf ormed 
(a) successively or (b) cumulatively. For successive 
selection, the regressor with strongest correlation with the 
output is first selected, and a model with this variable and 
the output is formed. The consequent residual is computed 
and the variable having maximum correlation with the 
residual is selected as the next variable, and once again a 
model is formed using the selected regressors and so on. 
Such a method is discussed in Draper and Smith (1968). 

For cumulative selection, all candidate models compri- 
sing the sets of r variables (2 s r < n) and the output are 
formed and the set minimizing a specific statistic is 
determined; the popular Cp statistic constitutes two 
additive terms: 

Cp = RSS r /s 2 - (m-2r) (3.6.10) 

2 

where RSS r and s are the residual sum of squares with 
r-parameter and the f ull n-parameter models respectively. 
Such a method is discussed in Daniel and Wood (1971). 

Both the above approaches involve explicit modelling 
with all candidate sets of regressors and hence the 
computational requirement is high. The former leads to 
optimal successive selection and the latter leads to the 
optimal selection. 

One direct approach f or subset selection based on a 
modified QRcp (m-QRcp) factorization method is discussed 
next. 

Fast subset selection using m-QRcp 

In m-QRcp factorization based subset selection, the pivoting 
of the columns is based on the correlation between the 
rotated output and the rotated candidate variable vectors; 
the variable showing maximum correlation is selected. The 
sequence of successive selections is registered in the 



3.6 Model Selection and Validation 


103 


permutation matrix P, where 

Q T AP = R, Q = lq lt .... q t .... q n l; 

q t are orthonormal columns, and R is upper triangular. The 
columns of regressor variables in A are assumed to be norma- 
lized to unit vectors. The successive rotation (leading to 
the successive selection) of the columns is explained below. 

The column vector of A producing max(ajy) is the most 
significant one (in successive selection terms), which is 
swapped with a t . The A so formed is appended by y forming 
X = [A y]. m-QRcp factorization of X is now performed as 
follows. Using the Gram-Schmidt orthogonalization concept 
q 1( the unit vector in the direction of a t is determined, as 
a^JajJ. The portion of aj (j = 2 to n) and y. in a direction 
orthogonal to a t will be given by 

(aj - qJajqj) and (y - qjyq t ) 

respectively. This operation is ref erred to as rotation of 
aj and y w.r.to The selected second vector is the one 
maximizing 

(a j -q^ajq 1 ) T (y-qJyq 1 ), 

which is swapped with a 2 . 

At the i-th stage of selection, the rotated variable- 
vectors (a*) and the rotated output (y*) vector are 

a] = aj - (q[ajq t + ... + ql-iajq!.*), 

i = 2 to n, j = i to n, 

y*= y - (q[yqi + ... + q[-iyqi_i), 

T 

and the i-th selected vector is the one maximizing a* y*. 

The selection is continued for up to r stages. Since 
the_column swappings are recorded in the permutation matrix 
P, A is given by the first r columns of AP in (3.6.9). The r 
parameters 0 in (3.6.9) are estimated using the LS method. 

An application of this subset selection scheme follows. 

Remarks 

(1) a* is in a plane orthogonal to the subspace spanned by 
earlier (i— 1) selected vector spaces. 

(2) r can be decided based on the value of Cp in (3.6.10). 

(3) The selection is direct and it does not involve any 
explicit parameter estimation f or selection and hence the 
computational requirement is minimal. 
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Example 3.6.4 Modelling of rocket engine testing 

In this problem the output variable is the chamber pressure 
(y), and the independent variables are the temperature of 
the cycle (x^, the vibration (x 2 ), the drop shock (x 3 ), and 
the static fire (x 4 ). 24 sets of data are available (Draper 
and Smith, 1967, p.218), which are as follows 


1.4" 


' -75 

0 

0 

-65* 

26.3 


175 

0 

0 

150 

26.5 


0 

-75 

0 

150 

5.8 


0 

175 

0 

-65 

23.4 


0 

-75 

0 

150 

7.4 


0 

175 

0 

-65 

29.4 


0 

0 

-65 

150 

9.7 


0 

0 

165 

-65 

32.9 


0 

0 

0 

150 

26.4 


-75 

-75 

0 

150 

8.4 


175 

175 

0 

-65 

28.8 

II 

X 

0 

-75 

-65 

150 

11.8 

0 

175 

165 

-65 

28.4 


-75 

-75 

-65 

150 

11.5 


175 

175 

165 

-65 

26.5 


0 

-75 

0 

150 

5.8 


0 

175 

0 

-65 

1.3 


0 

0 

-65 

-65 

21.4 


0 

0 

165 

150 

0.4 


0 

-75 

-65 

-65 

22.9 


0 

175 

165 

150 

26.4 


0 

-75 

-65 

150 

11.4 


0 

175 

165 

-65 

3.7 


0 

0 

0 

-65 


the columns of X are filled with the values of x t , x 2 , x 3 
and x 4 respectively. Accomodating nonlinearity in the 
variables, the engine testing process can be expressed as 

y = f(x 1 , x 2 , x 3 , x 4 , XjX 2 , XjX 3 , x 1 x 4 , x 2 x 3 , x 2 x 4 , x 3 x 4 ). 

(3.6.11) 

The matrix X is extended by the six quadratic terms shown 
above; let the regressor variables in (3.6.11) be designed 
as 1 to 10. First, the significant variables are to be selected. 

X is appended by the vector y and modified-QRcp 
factorization is performed. The results are presented in 
Table 3.6.4 along with the optimum successive selection 
(through exhaustive search) results. The cumulative squared 
error (CSE) or RSS is computed based on the least squares 
estimation of the parameters (including the mean or average 
term). 
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Table 3.6.4 Regressor selection and modelling performance 
in rocket engine testing problem 


r 

Selection through 
m-QRcp: 

selected variables 

CSE 

Opt imal 
selection: 
selected variables 

CSE 

1 

4 

297.58 

4 

297.58 

2 

4 10 

157.66 

4 10 

157.66 

3 

3 4 10 

97. 58 

2 4 10 

70.01 

4 

2 3 4 10 

56.63 

2 3 4 10 

56.63 

5 

2 3 4 7 10 

53.80 

2 3 4 8 10 

52.45 

6 

2 3 4 7 8 10 

48.50 

2 3 4 7 8 10 

48.50 

7 

2 3 4 7 8 9 10 

46.08 

2 3 4 7 8 9 10 

46.08 

8 

1 2 3 4 7 8 9 10 

44.85 

1 2 3 4 7 8 9 10 

44.85 

9 

12346789 10 

43.09 

12345789 10 

41.85 


Remarks 

(1) The method discussed offers suboptimal (in minimum CSE 
sense) successive selection of regressors in a linear-in- 
the-parameter regression problem, quite close to the optimal 
selection, at a fractional computational cost. 

(2) The selection is inherently free from collinearity 
problems (among the regressor variables), as successive 
orthogonal subspaces are considered. 

(3) In linear regression, successive selection of regressors 
does not guarantee selection of the optimal (in minimum Cp 
sense) set, because the regression relationship with optimal 
selection is a group phenomenon. It is possible that two or 
more variables which are individually weakly correlated with 
the output constitute a set which is more correlated with 
the output than the most strongly correlated variable. 
However, successive selection, being invariably faster than 
cumulative selection, is worth considering particularly 
when the number of candidate regressors is large. 


3.6.5 Cross Validation 

In the cross validation approach, the available data set is 
divided into two parts; one is used for parameter estimation 
and the other is used for validation of the estimated 
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models. These two sets are also called the training set and 
the checking set respectively. Models of increasing 
complexity may be estimated using the training set and the 
validation or suitability of the structure is determined 
based on the minimum of the sum of squared prediction errors 
between actual output and simulated outputs computed from 
the test data set using the estimated parameter values. 
Although cross validation is a useful method, the following 
points may be noted: 

(a) The method is not based on any probabilistic or 
structural assumptions. 

(b) The complete data set is not available for estimation 
purposes. 

(c) There are no definite rules for selecting the number of 
data points in each set, particularly for short data sets; 
the different ways of dividing the data set may lead to 
different results. 

Cross validation requires that the estimation and the 
prediction data should represent the same process dynamics. 
The data are also expected to be sufficiently rich or 
informative. Hence, for historical data, which are usually 
inf erior in quality to the experimental data, larger data 
sets are required. The data splitting methods are discussed 
in Stone (1974), Snee (1977) and Draper and Smith (1968). 

Cross validation is widely used in this book; f or 
example, see (i) COD modelling in Sec. 9. 4 using GMDH, 
(ii) COD modelling using a single layer nonlinear model in 
Sec.9.5, (iii) modelling of COD process and the Mackey Glass 
series in Sec.10.4.2 and Sec.10.4.3 respectively. 

Remark: There exists asymptotic equivalence between the 
choice of models by Akaike’s information criteria and by 
the cross validation approach, as shown in Stone (1977). 


3.7 CONCLUSIONS 

Some methods of parameter estimation and linear modelling 
have been discussed. The emphasis is on the method of the 
least squares estimation and its sequential implementation. 
The main attraction of the least squares method is its 
simplicity of implementation and the ability to produce 
workable parameter estimates in spite of the noise not being 
independent of the observations. The estimator shows nice 
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convergence and asymptotic properties, when the noise is 
independent of the data. For off-line or batch processing, 
the implementation using orthogonal transformation offers 
superior numerical properties; in particular SVD based 
implementation is extremely robust numerically as well as 
computationally and is applicable even if the estimation 
problem is ill-conditioned. For the sequential 
implementation square root algorithms are superior to the 
general implementations; the U-D filter is well known for 
its numerical stability and robustness. 

An introductory outline of three other methods of 
estimation: the instrumental variable method, the maximum 
likelihood method and the Koopmans-Levin method have also 
been presented; the first two methods are well documented, 
and are widely used, while the potential of the 
Koopmans-Levin method has not been fully explored. All these 
methods can produce unbiased estimates even if the noise is 
not independent of the observations. 

In estimation, model selection is a difficult task. A 
parsimonious model is desirable as it is likely to lead to a 
comparatively representative model. AIC or its variants are 
popularly used f or model order selection, but most such 
methods require exhaustive trials for ultimate selection of 
the variables. It has been shown that use of SVD and QRcp 
factorization based subset selection along with AIC or SIC 
can result in substantial reduction in the domain of 
exhaustive search for identification of the best subset- AR 
model. It has also been shown that with a modified column 
pivoting scheme QR (or m-QRcp) factorization can lead to 
f ast successive selection of regressors in linear 
regression. Further research is needed in this area. 

Cross validation of the identified and estimated model 
against sets of data not used f or modelling, remains viable 
approach for testing the quality of the model. 
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CHAPTER 4 


SOME POPULAR METHODS OF PREDICTION 


Popularly used methods based on time series models or 
transfer-function models can produce reasonably good 
predictions often at moderate computational cost. 


4.1 INTRODUCTION 

Some of the popular methods of the modelling and prediction 
used by time-series analysts, econometricians and process 
and business analysts are presented in this chapter. The 

models used broadly f all into the category of time series 
models or transfer-function models. These methods are widely 
covered in the literature, and a detailed study is beyond 
the scope of this book. The purpose of this chapter is to 

familiarize the reader with the underlying concepts of some 
of these methods. 

One of the simplest and yet robust methods of 

prediction is the exponential smoothing based predictor, 

which is discussed in Sec. 4. 2. Here the time series is 
modelled through low-pass filtering (see Appendix 14A). If 
the data have explicit structural components like a trend 
or a periodic component etc., the individual structural 
components may be separately modelled. The procedure has 
attractively low computational requirement. 

The Box and Jenkins method, studied in Sec. 4. 3, is one 
of the most popular and powerful methods of prediction. The 
data are appropriately transformed (through time-differen- 
cing etc. ) and converted into a stationary series, which is 
represented by a transfer-function model. The model 
essentially has two elements: (i) the explicit model of the 
transformed series, and (ii) the implicit incorporation of 
process information through the data transformation. 
Although the computational requirement is moderately high, 
this method has been successfully applied to a wide variety 
of processes. 

Two other methods briefly discussed in this chapter are 
the regression method and the econometric method. 


Ill 
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4.2 SMOOTHING METHODS OF PREDICTION 

Smoothing methods of prediction are largely based on 
heuristic understanding of the underlying process. Time 
series with or without seasonal components may be treated. 
Usually the smoothing methods do not involve any estimation, 
as the concerned parameters of the model are assumed to be 
known. Some of the principal smoothing algorithms are 
presented in this section. 


4.2.1 Basic Smoothing Methods 
The naive model 

The simplest approach to prediction is to disregard any 
dynamics and assume the variable not to change in future 
from the present value, i.e. one-step ahead prediction of 
y(k) given by 

y(k+l)|k) = y(k). (4.2.1) 

In addition, it may be assumed that the non-zero trend in 
the time series remains unchanged 

y(k+l | k) - y(k) + (y(k)-y(k-l)). (4.2.2) 

A time-averaged value for the trend may be used. 

Averaging 

In simple moving averaging, the averaged value is computed 
based on the averaging of the data over a moving window 
(k,(k-n)): 

y(k+D = — t y(k-j), 

n+1 J=0 

typically n=3. Instead of applying equal weighting to the 
successive data, it may be more meaningful to give more 
importance to the more recent data. 

Linearly increasing weighting may be applied to the 
data but exponential weighting, referred to as exponential 
smoothing, is more popular in which case 

y(k+l) = ay(k) + (l-a)y(k), 

= - ( - ~ g : 1 y(k), 

(l-aq *) 


0<a<l 


(4.2.3) 
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Exponential 

function 


Figure 4.2.1 Exponential weighting function l/Q-aq 1 ), 
for different values of a; a lies between (0,1). 


where on expansion, 

1 , -1 . 2 -2 . 3 -3 

= l + aq + a q +aq +... 

1-aq 1 

The pattern of weighting, which is exponential, is shown in 
Fig. 4. 2.1. The smaller the value of a, the faster the older 
data are forgotten, or in other words, more importance is 
ascribed to the more recent data, implying shorter memory 
for the prediction scheme. For a = 0, (4.2.3) reverts to the 
naive model (4.2.1); a = 1 is not permissible. Typical 
values of a are 0.1 to 0.3. 

Remarks 

(a) Analogy to low-pass filtering: In terms of frequency 
domain analysis, the exponential smoother acts as a low-pass 
filter (see Appendix 14A), where the relatively higher 
frequency components contained in the data sequence are 
attenuated. The higher the value of a, the lower are the 
frequencies passed through. 

(b) Lag in the smoothed results: In any data processing, 
if only past values are used, the processed data tend to lag 
in time with respect to the original time series. In case of 
data with seasonal variations, the lag shows up in 
the conspicuous shifts in the peaks and the troughs of the 
series. The lag can be reduced by 
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No. of 
sunspots 



Figure 4.2.2 One-step ahead prediction of the sunspot 
series using exponential smoothing with a * 0.1 
( — data, prediction). 


(1) using centred averaging or smoothing (as in Appendix 4), 

(2) bidirectional filtering (Sec.14.6.1), or 

(3) backcasting or backward-smoothing the data prior to use 
for forecasting or forward-smoothing (Box and Jenkins, 1976, 
p.199). 

One common feature of these three methods is the elimination 
of the lag, which results from the inherent use of both the 
past and the future data with respect to the point of 
smoothing. The first two approaches are discussed elsewhere 
as stated; in the last approach, the data set is reversed 
in time and the normal smoothing is performed, the resulting 
data are time-reversed and the smoothing process is 
repeated. The result is lag-free smoothing. 


Example 4.2.1 Prediction of the sunspot series over short 
time-span 

The modelling of the yearly averaged sunspot series 
(Appendix 8A) features widely in this book. Here the 
one-step ahead prediction of the series over twelve years 
(1977-1987) is considered. The model (4.2.3) is used with a 
= 0.1. The results are presented in Fig.4.2.2. As expected 
the prediction tends to lag the actual series; the lag 
increases for higher value of a, where the high frequencies 
are largely attenuated due to the effect of low-pass 
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filtering of the exponential smoother. 


4.2.2 Multiple Smoothing Algorithms 

In (4.2.3) only one-parameter smoothing has been performed; 
instead the series may be considered as a composite of more 
than one structural component (like the average, the trend 
and the seasonal component) each of which can be individu- 
ally modelled. 

Predictor models for series without seasonality 

Such types of time series can be expressed as 

y(k) = y av (k) + py tr (k) + e(k), p=0, 

where y(k), y av (k), y tr (k) and e(k) are the data, the 
average (or mean) value, the trend component and the 
modelling error respectively at time k. Both the average and 
the trend components are individually modelled using 
exponential smoothing. 

The p-step ahead prediction is given by 

y(k+p|k) = y av (k) + py tr (k). 

One popular approach is the Holt’s method. 


Two parameter double exponential smoothing: Holt's method 

The average and the trend components are modelled as 

y av (k) = (l-a)y(k) + a(y av (k-l) + y tr (k-l)) 

y tr (k) = (1-B)y tr (k-1) + 0(y av (k) - y av (k-l)); 

a and B lie between (0,1). The predictor may be initialized 
as 


y av d) = yd). 


yt (1) = (y(l)-y(O)) + (y(2)-y(l)) 


Predictor models for series with seasonality: 

Winters’ method 

Here, in addition to the average and the trend components, a 
seasonal component (y s ) is also considered. The trend and 
the seasonal components may be linked multiplicatively or 
additively, which are modelled using exponential relation- 
ship (Winters, 1960) as follows. 
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Multiplicative predictor model 

Considering the trend and the seasonal components not to be 
independent of each other, the model is expressed as 

y(k+p) = (y av (k) + py tr (k))y s (k+p-N) + e(k+p); 

The structural components are initialized and updated as 
f ollows. 

(a) Initialization of trend and average components: 

Let the period length of the series be 12 (e.g., for yearly 
periodic monthly data). Let the centred average of one 
complete period preceding and following y(13) be called Y t 
and Y 2 which are given by 


= y(l)+y(2)+. . ,+y(12) 

-i[( 


y(2)+y(3)+. . .+y(13) 



12 J 

+ r 


12 

JJ’ 

(4.2.4a) 

y(l3) + 

... + y(24) 1 

+ [ 

y(14) 

+ . . . 

+ y(25) n 


12 J 

l 


12 

JJ- 






(4.2.4b) 

i for 

Y! and Y 2 

will 

be 

y(7) 

and y(19) 

The 

initial value 

for 

the 

trend 

component 


The 


y tr (13) at the 13th month is given by 
y tr (13) - Y a 


(4.2.5) 


The initial value for the average component y av (13) is given 
by 


y av d3) = Ya l Y * . 


(4.2.6) 


while the same for any other point can be computed using 
y tr (13) and Yj or Y 2 ; for example, for the 15th month, 

y av (15) = Y 2 - ( 19-15 )y tr ( 13). 


(b) Initialization of seasonal component: 

To find the initial value for y s , first the corresponding 
average component is extracted; for example, for the i-th 
month in the period of interest, 


y 8 ' <» 


y (i) 

y^nr* 


the desired initial estimates of the seasonal components are 
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now computed as 

y 8 (i) = ys'w/ z E ys'M. 

/ N l=l 

N being the period length. 

(c) Updation of structural components: 

The three components y av , y tr and y s are updated using the 
principle of exponential smoothing as 

y av (k) = a(y av (k-l)+y tr (k-l)) + (l-a)(y(k)/y s (k-N)), 

y tr (k) = 0(y tr (k-l)) + (H3)(y av (k)-y av (k-l)), 

y s (k) = y(y s (k-N)) + (l-y)(y(k)/y av (k)); 

with the chosen smoothing parameters, a, 8 and y, lying 
between (0,1). Usually, y is higher than a or 8, since the 
seasonal component varies relatively slowly. 

(d) Prediction: 

The p-step ahead prediction produced at time k is given by 

y(k+p|k) = (y av (k)+py tr (k))y s (k+p-N). 

If p>N, instead of y s (k+p-N), y s (k+p-2N) may be used which 
is the best available estimate of the seasonal component 
concerned. 

Additive predictor model 

Here, the trend and the seasonal components are assumed to 
be independent of each other, as follows 

y(k+p|k) = (y av (k)+py tr (k)) + y B (k+p-N) + e(k+p-N). 

The modelling procedure is similar to the multiplicative 
case. The initial seasonal component is computed as follows: 

y s '{i) = y(i) - y av (i), y s (i) = y B ' ti)/ jjjjJE^y.' (i). 

y tr is initialized the same way as before. The updating is 
done as follows. 

y av (k) = a(y av (k-l) + y tr (k-l)) + (l-a)(y(k)-y s (k-N)), 
y tr (k) = 8y tr (k-l) + (l-8)(y av (k)-y av (k-l)), 
y s (k) = yy s (k-N) + (l-y)(y(k)-y av (k)). 

The multistep prediction is given by 
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y(k+p | k) = y av (k) + py tr (k) + y e (k+p-N). 

Remarks 

(a) Cost and performance: Exponential smoothing is a simple 
method which is computationally cheap and fast. At the same 
time this method can perform remarkably well in comparison 
with other complex methods. 

(b) Adaptive prediction: The smoothing coefficients, a, B 
and y, may be chosen from prior experience, and kept fixed 
particularly when predictions for a large number of time 
series are to be performed, (e.g., sales prediction for 
commodities in a departmental stores). However, the optimum 
values of the coefficients may be computed by solving a 
minimum mean prediction-error square problem, (as in Theil 
and Wage, 1964), which may be formulated as a usual least 
squares estimation problem. 


4.3 BOX AND JENKINS METHOD 
The Box and Jenkins method involves 

(a) transformation of the univariate or multivariate time 
series into stationary time series, and 

(b) modelling and prediction of the transformed data 
using a transf er-f unction model. 

A discrete-time linear model of the time series or process 
is used. Prior to use, the data series are transformed into 
stationary series (discussed in Sec. 2.2.1), this is to 
ensure that the probabilistic properties of mean and 
variance of the data series remain invariant over time. 
Usually suitable time-differencing is performed on the data 
sequences to achieve stationarity. Identification of the 
model and estimation of the parameters are performed using 
standard methods. 


4.3.1 Modelling Characteristics 

The process is modelled as a linear filter driven by a white 
noise sequence. For example 

y(k) = e(k) + Cjefk-l); 

that is y(k) = (1 + c 1 q~ 1 )e(k), 


(4.3.1) 
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where (e(k)> is the discrete white noise sequence and ^yOc)} 
is the time series or process output sequence. Here q , the 
backward shift operator, is the same as the backward shift 
operator B used in the statistics literature, as in 

By(k) = y(k-l) = q _1 y(k), or say 

B d y(k) = y(k-d) = q~ d y(k). 

A generalized model can be expressed as 

y(k) + a^Ck-l) + a 2 y(k-2) +...+ a p y(k-p) 

= e(k) + c^lk-l) + c 2 e(k-2) +...+ c r e(k-r); 

that is 

A(q _1 )y(k) = C(q _1 )e(k), 
where 

A(q _1 ) = 1 + a t q _1 + a 2 q~ Z +...+a p q~ p , 

_1 _2 -8 
C(q ) = 1 + ^q + c 2 q +...+c r q . 

Before the data series are used for modelling, the data may 
be subjected to 

(i) nonlinear transformation, and 

(ii) stationarity transformation. 

The purpose of nonlinear transformation (discussed in Sec. 
8.2.4) is to be able to represent the process by a linear 
model. The stationarity transformation is to transform the 
time series into a stationary series through nonseasonal 
and/or seasonal time-differencing. It may be noted that the 
data transf ormation is a way of implicitly incorporating 
available information into the data used for modelling. 

Nonseasonal time-differencing 

First time-difference of the series (y(k)> is given by 
Yj(k) = y(k) - y(k-l) = (l-q _1 )y(k) 

= Ay(k), 

where A is the unit time-difference operator. 

A further stage of first differencing applied to the 
series (y(k)>, leads to the series, 

Y 2 (k) = Yj(k) - Y a (k-1) 

= (y(k)-y(k-l)) - (y(k-l)-y(k-2)) 

= y (k) - 2y (k-1) + y(k-2). 


(4.3.2) 


(4.3.3) 
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That is 

Y 2 (k) « (l-q -1 ) 2 y(k) 

= A 2 y(k), (4.3.4) 

which is the expression for second-order differencing. 

Thus dth-order differencing is given by 

Y d (k) = (l-q -1 ) d y(k) 

= A d y(k), (4.3.4) 

which results from d successive (unit) time differences 

being performed on the data sequence. In practice, d, the 
degree of differencing, rarely needs to exceed 2 in order to 
be close to a nonstationary process. 

A generalized expression for a nonseasonal model is 

given by 

A(q _1 )A d y(k) = C(q _1 )e(k), (4.3.5) 

which is an ARIMA model of order (jp,d,r), where the 
discrete-time polynomials A(q _I ) and C(q j are of order p 
and r respectively. 

Remark : In an ARIMA (p,d,r) model (4.3.5), p is the order of 
autoregressive (AR) part, r is the order of the moving 
average (MA) part and d is the degree of time-differencing 
applied to <y(k)>. Hence, to be precise, if any of p, d or r 
is zero, the model is not ARIMA process any more. So an 
ARIMA (p,o,o) is in fact an AR(p) process and so on. 

Seasonal time-differencing 

If the time series <y(k)> is seasonal or periodic in nature 
with a period length of N, it is expected that the data 
series will be 

(i) seasonally related, (i.e. y(k) related to y(k-N) etc.), 
besides being 

(ii) serially related, i.e. y(k), y(k-l), y(k-2), etc. will 
be mutually related. 

Consider a basic seasonal model 

y(k) - y(k-N) = e(k); (4.3.6) 

here the time series (y(k)> is transformed by first degree 
of seasonal time-differencing: 
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y(k) - y(k-N) = (l-q“ N )y(k) 

= A N y(k), 

where A N (= 1-q = 1-B ) is the seasonal N-difference 

operator. 

If D-degree of seasonal time-differencing is performed 
on the data, a generalized seasonal model may be expressed 
as 

A(q -1 )(l-q" M ) D y(k) = C(q -1 )e(k). (4.3.7) 

In recognition of the serial correlation of the data, if 

additional d-th order differencing is introduced in (4.3.7), 
the multiplicative seasonal model or the mixed seasonal and 
nonseasonal model is obtained. 

Alq'bll-q'Vd-q'Vyfk) = C(q _1 )e(k); (4.3.8) 

that is 

A(q _1 )z(k) = C(q -1 )e(k), with 

z(k) = A d A|}y(k); (4.3.9) 

where the parameters of A(q -1 ) and C(q ’) are estimated 

using the transformed sequence (z(k)h 

There may be more than one seasonal component in the 
data. For example, in the case of hourly electrical power 

load demand series (Appendix 7D), there will be 

(a) a daily seasonal component requiring A 2 4 differencing, 
and 

(b) a weekly seasonal component requiring A 168 differen- 
cing. Hence multiple periodicity of order n can be 

accommodated in the model (4.3.9) by substituting 

An 1 ... for a|}, 

12 n 

where D lt D 2 D n are usually unity. 


Example 4.3.1(1) Assessment of the presence of periodicity 
in the transf ormed German unemployment series using SVR 
spectrum 

As discussed in Appendix 11, the presence of a dominant 
periodicity in the data series is indicated by a repeating 
peak at multiples of the concerned period length in the SVR 
spectrum. 
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Figure 4.3.1 The original and transformed German 
unemployment series and their SVR spectra, (a), (b), 
(c) show {y(k)>, (z^k)} and <z 2 (k)> series, and (d), 
(e), (f) show their respective SVR spectra. 


The monthly German unemploment data series <y(k) > 
(Appendix 7E, Fig.4.3.1a) shows a trend as well as a 

seasonal component of period length 12. Consider the 
differencing transformations: 
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Zt(k) = Ay(k), and z 2 (k) = A 12 y(k). 

Fig. 4.3.1 ((a) to (f )) shows the sequences (y(k)>, {z t (k)} 
and (z 2 (k)>, and their SVR spectra. Peaks at multiples of 
the period length 12 are seen to be present in the spectra 
of both <y(k)> and {z 1 (k)>, whereas this peak disappears in 
case of <z 2 (k)>. This shows that the yearly periodic 
component present in <y(k)> or in (z^k)) is extracted 
through A 12 differencing. 

Prediction 

For a process modelled as 

A(q _1 )A d y(k) = C(q _1 )e(k), (4.3.10) 

one-step ahead minimum mean square error prediction is 
defined as the conditional expectation of y(k+p), at time k, 
that is 

y(k+l|k) = £(y(k+l)|y(k),y(k-l)...). 

The error sequence {e(k)> may be expressed as 

e(k) = y(k) - y(k | k— 1). 

e(k-l) = y(k-l) - y(k— 1 j k— 2) etc. 

Once the parameters are estimated, the predictions can be 
computed using (4.3.10), where all future values of the 
error term e(k+l), e(k+2) etc. are assumed to be zero, since 
£(e(k+l)|k) = 0. 


Example 4.3.1(2) Prediction of an ARIMA (1,1,1) process 
Consider the model: 

(l-a 1 q -1 )Ay(k) = (l-c 1 q“ 1 )e(k). 

Here, 

y(k) = y(k-l) + a^k-l) - ajy(k-2) + e(k) - c 1 e(k-l). 
So 

y(k+l|k) = (l+ajlyik) - a^lk-l) + e(k+l|k) - c^lk), 
where e(k+l|k) = 0; e(k) may be approximated by 
e(k) = y(k) - y(k|k-l). 

One-step ahead prediction is given by 
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Figure 4.3.2 The airline traffic series (Appendix 7A.2) 


y(k+l|k) = (l+a 1 )y(k) - a 1 y(k-l) - c 1 e(k), 
where and are the estimated parameter values. 

Again to compute y(k+2|k), form 

y(k+2|k) = ( l+a t )y(k+l | k) - ajy(k) + e(k+2|k) - c 1 e(k+l|k). 

So the two-step ahead prediction is given by 

y(k+2|k) ■ (l+a 1 )y(k+l|k) - a 1 y(k), 
and so on. 


Example 4.3.1(3) AR modelling of the airline traffic series 

This series (See Fig. 4. 3. 2 and Appendix 7 A. 2) contains 
monthly data over 12 years. The objective is to model the 
series using the data for the first 9 years and to produce 
the prediction for the next 3 years. 

The series has a trend and a yearly periodic component, 
and also a diverging pattern. The series is logarithmically 
transformed to eliminate the divergence; further, unit 
differencing is applied to eliminate the trend component, 
and periodic differencing is applied to eliminate the 
periodic component. Thus the data series <y(k)> is 
transformed to (z(k)> where 
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Figure 4.3.3 One- to 41-step ahead prediction of 
airline series using the subset-AR model (4.3.11b) 
prediction). 


z(k) = (1-q 1 )(l-q _1Z )log e y(k). 

A full AR model for (z(k)> can be expressed as 

z(k) = a t z(k-l) + a 2 z(k-2) + ... + anZ(k-n) + e(k); 

here n, the maximal order, given by the lag for which the 

partial autocorrelation function of <z(k)> is reasonably 

high, works out to be 12. As discussed in Sec. 3. 6. 3, the 
best subset-AR model can be determined using an information 
criterion (like AIC) and subset selection, and using the 

complete data set (z(k)>, the model works out as 

z(k) = -0.3004z(k-l) - 0.4205z(k-12) + e(k), 

-3 

with AIC = -6.477, and the residual variance = 1.46x10 ; If 
data up to July of the 9th year are assumed to be available, 
the model works out as 

z(k) = -0.326z(k-l) - 0.446z(k-12) + e(k) (4.3.11a) 

-3 

with AIC = -6.322 and the residual variance = 1.66x10 . 
p-step ahead prediction can be computed from 

z(k+pjk) = -0.326z(k+p-l|k) - 0.446z(k+p-12|k), 

(4.3.11b) 

where predicted values (from (4.3.11b)) are used for 
z(k+p-12|k) and z(k+p-l|k) if the actual data are not 
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available. The one to 41-step ahead prediction result for 
the series is shown in Fig. 4. 3. 3; the corresponding 
MSE = 0.0089. The parameters are estimated using the 
Marquardt algorithm (Box and Jenkins, 1976, p.504). 


4.3.2 Implementation Aspects 

Some necessary conditions f or robust implementation of the 
formulations presented in Sec. 4. 3.1 are discussed here. 

Stability and invertibility 
Consider a generalized model 
A(q _1 )z(k) = C(q _1 )e(k), 

where (z(k)> is the appropriately differenced or filtered 
observation sequence <y(k)>. It is necessary that both 
A(q -1 ) and C(q -1 ) polynomials are stable; A(q -1 ) being 
stable implies C(q -1 )/A(q _1 ) is bounded for bounded input 
(e(k)>, while C(q _1 ) being stable implies the model is 
invertible (see Appendix 12A for further discussions). 

Time-differencing aspects 

The differencing of the data sequences to induce stationari- 
ty needs to be performed carefully. 

(a) When the data are noisy: 

A spurious noise contamination will increase on differen- 
cing. So unexpected excursions or outliers in the data 

should be eliminated before differencing. Alternatively the 
data may be low-pass filtered (e.g., T(q _1 ) filter as in 

(2.4.7)) or bidirectionally filtered (Sec.14.3), before 
differencing, to reduce or eliminate the effects of noise. 

(b) Over-differencing of the data: 

Over-differencing is undesirable, as it leads to noninver- 

tible models. For example, if a data series incorporates a 
deterministic polynomial trend of degree n, nth degree of 
differencing will eliminate the trend producing a stationary 
series. Any further differencing will render the model 
noninvertible (see Sec. 8. 2.1) 

Appropriate degree of differencing can be ensured by 
the examination of the autocorrelation function, which goes 
towards zero, as the lag increases. The problem of 
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over-differencing can be avoided, if the roots of C(q *) are 
constrained to lie inside the unit circle. 

Parsimonious parameterization 

It is important that the lowest possible model order, or to 
be more precise, minimum number of variables are used for 
modelling. Overfitting of a data set by too many parameters 
makes the model unsuitable for data sets which are not used 
f or modelling, and large modelling errors result. 
Over-parameterization may incorporate common factors between 
the left and right hand sides of the expression for the 
model (e.g., in equations (4.3.5), and (4.3.8)), which may 
not be precisely identifiable leading to erroneous modelling 
and poor predictions. In the present context, over-differen- 
cing is akin to over-parameterization. 


4.4 OTHER SELECTED METHODS 

Two other popular methods of prediction are the regression 
method and the econometric method. 

Regression method 

The regression method concerns prediction of a dependent 
(endogenous) variable through the relationship with a number 
of independent (exogenous) variables, where the disturbances 
or the uncertainties in the model are statistically defined. 
Regression methods are problem specific and hence can be of 
different types, e.g., linear regression, nonlinear 
regression, multiple linear regression, multivariate 
regression etc. typically as follows. 

Linear regression: 

y(k) = ao + a 1 x 1 (k) + a 2 x 2 (k-l) + a 3 x 3 (k-3) + e(k). 
Nonlinear regression: 

y(k) = a t x(k) + afx(k-2). 

Multiple linear regression: 

y(k) = a 1 x 1 (k) + a^fk-l) +...+ ajjX^k-n) 

+ bjX 2 (k) + b 2 x 2 (k-l) +...+ b n x 2 (k-n) 

+ CjX^k) + c 2 x 3 (k-l) +...+ c n x 3 (k-n) +...+ e(k). 
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Mutivariate regression: 

y t (k) = ajXjOO + a 12 x 1 (k-l) +... 

+ b n x 2 (k) + b 12 x 2 (k-l) +...+ e 1 (k), 
y 2 (k) = a 21 X!(k) + a^x^k-l) +... 

+ b 21 x 2 (k) + b 22 x 2 (k-l) +...+ e 2 (k). 

The symbols used have their usual meanings. 

Two basic issues arise in regression modelling: 

(a) selection of the regressors which are essential for 
representative modelling of the process concerned, and 

(b) the proper estimation of the parameters. 

Each selected regressor is expected to be independent of 
other regressors, and at the same time correlated with the 
dependent variable. A regressor, which is a linear function 
of the other variables, is redundant and hence should be 
eliminated. Regressors which are orthogonal to the output 
are not necessarily redundant. 

For further discussions on regression models see 
Secs. 3.3.2, 3.3.3, 3.6.4, and 9.3. 

Econometric method 

Econometrics deal with the study of economic systems or 
processes in the statistical framework. Usually an econo- 
metric model has more than one independent (exogenous) 
variable and one or more dependent (endogenous) variable. 
Thus regression methods can be considered to be a subset of 
econometric methods. However, most economic problems involve 
relationships between a number of interdependent variables, 
i.e. the dependent variables themselves usualy have causal 
influence on each other (note: the term causal means caused 
by). Such relationships are expressed by sets of 
simultaneous equations. The expressions relating interdepen- 
dent variables through sets of simultaneous equations form a 
class of models unique to econometrics. The parameters of 
the simultaneous equation models can be estimated using the 
two stage least squares or the three stage least squares 
method (Theil, 1971, Judge et al 1982). In practice ordinary 
least squares are used for approximate solutions. 

Econometric models are diff erence equation expressions 
which are usually nonlinear and nonstationary in nature. The 
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models are often too large, and usually there are 
uncertainties associated with the specification of the model 
equations. This is due to the inadequacy of economic theory 
to aggregate different economic variables as well as the 
imprecise knowledge of the response patterns. The limited 
number of time series observations also cause poor 
understanding of the underlying process. In practice, for 
large scale models, the judgemental decision based on the 
understanding of the process can play a significant role in 
the predictions. 

Example 

Based on the Keynesian economic model (named after the 
economist J.M. Keynes, 1883-1946), the national income can 
be expressed by the simplified model (Levi, 1983): 

Y = C + I 0 + (F x q ~ Fj), C = Cq + cY, and 

F i = F I0 + mY, 

where Y is the national output or income (GNP), C is the 
aggregate consumption of goods, I 0 is the given amount of 
investment, is the given amount of exports, and F x is 

the imports. The Cq part of consumption does not depend on 
income. The income dependent propensity to consume and to 
import are given by c and m respectively. Here the 
uncertainty or the equation errors have not been shown. For 
more complete macroeconomic models, see Chow (1981). 

Remarks 

(1) Econometric models tend to be large in size. It may be 
necessary to compromise between the prior knowledge of the 
underlying economic process and the statistical 
justification coupled with assessment of redundancy. 

(2) The noise associated with data may be due to 
unrepresentative sampling or measurement as well as concep- 
tual misunderstanding. 

(3) Economic processes usually have seasonality which may 
be removed from the data before modelling; both the seasonal 
and the trend components may be modelled as non- 
deterministic terms. 

(4) The time series forecasting assumes an open-loop 
architecture, whereas econometric modelling and f orecasting 
assume a closed-loop architecture. The problem of solving 
simultaneous equations appearing in econometric models is 
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analogous to that found in the case of multivariable control 
problems having cross coupling between different inputs and 
outputs. 


4.5 CONCLUDING REMARKS 

The main features of some of the methods of modelling and 
prediction, which are popular with time series analysts and 
forecasters, have been briefly discussed. 

The smoothing methods of modelling are computationally 
cheap and robust; these are applicable to series with 
limited dynamics. The low-pass filtering implicit with the 
smoothing methods result in the elimination of (a) the 
spurious disturbances which usually appear as outliers, 
and (b) the time lag in the smoothed data. The time lag 
problem can be remedied by using centred moving averaging 
or by using bidirectional smoothing or fixed interval 
smoothing. 

It may be noted that in systems engineering, the 
term smoothing stands for the estimation of the future 
values of the series (y(k)>, y(k+i|k) where iso (see 
Sec. 14. 2), whereas in the present context of time series 
analysis i>0. 

The Box and Jenkins modelling approach involves 
appropriate transf ormation of the data to improve the 
stationarity. Periodic information can be incorporated into 
the trans- formed series through periodic differencing of 
the data. There are various ways of transforming the data, 
and reasonably good predictions can be produced, using the 
Box and Jenkins method for a variety of processes. Some care 
is necessary in the data differencing procedure; spurious 
disturbances, or high frequency noise are exaggerated 
through the differencing of the data and hence should be 
removed separately, say by outlier rejection or prior 
low-pass filtering of the data etc. It may also be noted 
that in the method of constrained prediction discussed in 
Chapter 5, the appropriate increments are constrained and 
thereby the extent of noise remains contained. 

The regression model remains one of the popular methods 
of modelling and prediction. For proper representation of 
the underlying process, linearly dependent or redundant 
regressors should be eliminated. 

The econometric models are comparatively difficult to 
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construct because of the inherent f eedback between the 
variables of interest. The modelling problem will be 
simplified, if the influences of dif f erent variables can be 
decoupled. 
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CHAPTER 5 


ADAPTIVE PREDICTION USING TRANSFER-FUNCTION MODELS 


The prediction can be more meaningful when properly 
constrained, based on process knowledge. 


5.1 INTRODUCTION 

Adaptive prediction is usually based on minimization of the 
mean square prediction error. The prediction involves a two 
stage procedure: 

(i) estimation of the parameters of an appropriate model of 
the time series or the process, and 

(ii) reconfiguration of the process model into a prediction 
model, and computation of prediction using the 
estimated parameters. 

Usually the second stage is trivial and the quality of 
prediction depends solely on the quality or representa- 
tiveness of the model. 

In practice, the prediction may not always be sensible, 
some of the reasons for which can be explained as follows. 

(a) It is implicitly assumed that the available or measured 
data are exact, or are corrupted by a white noise sequence. 
However, if the time series is the measured output of an 
industrial plant, the desired signal may not be the actual 
measurement but some underlying process variable. The 
measurement itself may be contaminated with significant 
noise of bandwidth overlapping with the signal. The 
measurement may also be indirect and deduced from other 
measurements (consider for example, the measurement of the 
product strength in the iron ore sintering process, 
discussed in Sec.5.5.1). 

The minimum mean square error predictor attempts to 
track the measured data and hence will be inevitably 
susceptible to disturbances associated with the data. 

(b) The variable being predicted may have distinctive 
characteristics which should be embedded in the predictor as 
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prior information. For example, the signal may be inherently 
rate limited (e.g., the movement of a ship or say the 
monetary inflation of a country). The conventional predic- 
tors can incorporate such features in the model only, and 
not in the prediction algorithm. 

For an ideal predictor, it is important that misleading 
information in the data is ignored and the underlying 
process variable is followed with due regard to the probable 
future values, which are based on the user’s prior knowledge 
of the process. Assimilation of the subjective knowledge in 
the prediction algorithm is expected to result in sensible 
and meaningful predictions. This chapter presents a 
constrained mean square prediction error strategy based on 
penalization of both the prediction error and the increments 
of prediction; the latter is added to accom m odate prior 
process knowledge into the prediction algorithm. 

The commonly used minimum mean square error prediction 
scheme is discussed in Sec. 5. 2. The concept of constrained 
mean square error prediction is explained in Sec. 5. 3, and 
the prediction algorithms are produced. A recursive method 
of extending optimal prediction to multistep predictions is 
discussed in Sec. 5. 4. Sec. 5. 5 is devoted to a case study on 
the application of adaptive prediction to the iron ore 
sintering process. This chapter is supported by two appendi- 
ces; the recursive solution of the Diophantine identity, 
used for multistep prediction, is presented in Appendix 5A, 
and the formulation of a multivariable predictor is 
discussed in Appendix 5B. 


5.2 MINIMUM MEAN SQUARE ERROR PREDICTION 

The minimum mean square error prediction is based on the 
minimization of the cost function 

J m = £{e Z (k+p)>, 

where the prediction error, 

e(k+p) = y(k+p) - y{k+p|k), 

y(k+p|k) is the p-step ahead prediction of the output y at 
time k; the objective is to produce y(k+p|k). 
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Predi c t Ions 


(a) Explicit predictor 



Predic t ions 

(b) Implicit predictor 

Figure 5.2.1 Adaptive prediction schemes using 

(a) Explicit (indirect) approach, and 

(b) Implicit (direct) approach. 


In the present context, there are two basic methods of 
prediction (Fig.5.2.1): 

(i) Explicit or indirect method: The process model is 
estimated from the time series or the input-output data. A 
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prediction model is formed using the estimated parameters 
and the multistep predictions are produced. 

(ii) Implicit or direct method: A predictor model is consi- 
dered, the parameters of which are directly estimated using 
the process data; the predictor model implicitly represents 
the process. 


5.2.1 Explicit (Indirect) Prediction 

ARMAX model based design 

Consider the process model 

A(q -1 )y(k) = B(q _1 )u(k-d) + C(q _1 )e(k), (5.2.1) 

where y is the output of the process, u is the known input 
to the system, and (e(k)> is an uncorrelated random noise 
sequence. A and B are polynomials in the discrete backward 
shift operator q” : 

A(q~ ) = 1 + a^ - + a 2 q + ... + a,^" , 

1 “1 “2 “ii 

B(q" ) = b 0 + b t q + b 2 q" + ... + b n q" , 

“1 “1 -2 _ n 

C(q ) « 1 + c x q + c 2 q + ... + c n q~ , 

where q _1 y(k) = y(k-l). It is assumed that the input and the 
output sequences are mean square bounded and that the roots 
of C(qj lie inside the unit circle. 

Remark: The roots of C(q _1 ) lying inside the unit circle are 
equivalent to the roots of the reciprocal polynomial C(q) 
lying outside the unit circle, which means that the solution 
of C(q” ) = 0 should lie inside the unit circle (i.e. 
-l<|q|<l). For example, if C(q _1 ) = l-0.7q~\ the root is at 
0.7, or equivalently C(q) = l-0.7q, having the root at 
1/0.7; here C(q" ) or C(q) is said to be stable. 

To compute p-step ahead prediction, define the identity 

(also referred to as the Diophantine identity): 

C(q _1 ) = E^p(q" i )A(q~ 1 ) + q" p F p (q -1 ), (5.2.2) 

where the degree of Ep(q _1 ), SEp = p-1, and 5F p <5A: 

Ep(q _1 ) = 1 + e 1 q" 1 + ... + e p _ 1 q" p+1 , 

F p (q _1 ) = f o + f,q _1 + ... + f n _! q" n+1 . 
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Following (5.2.1) and (5.2.2), 

Ep(q -1 )A(q -1 )y(k+p) = E p (q" 1 )B(q" 1 )u(k+p-d) 

+ Ep(q _1 )C(q _1 )e(k+p). 

Using (5.2.2), 

C(q -1 )y(k+p) = F p (q -1 )y(k) + Epfq^jBlq'Mufk+p-d) 

+ Eptq^Ctq^Mk+p). 

Hence 

F p (q -1 ) Ep(q _1 )B(q _1 ) 

y(k+p) = — y(k) + u(k+p-d) 

C(q _1 ) C(q _1 ) 

+ Ep(q -1 )e(k+p). 

Taking conditional expectation on both sides, 

/■F p (q _1 )\ ^(q'MBfq -1 ) , 

£{y(k+p|k)> = £ — y(klk) + £ u(k+p-d|k) 

1 C(q"V Ctq 1 ) > 

+ £(Ep(q _1 )e(k+p|k)). 

Since Ep(q -1 ) is of degree p-1, 

£{Ep(q -1 )e(t+p)) = 0; 

again by definition £(y(k+p|k)> = y(k+p|k), and since the 
inputs u(k+p-d), u(k+p-d-l), etc. are assumed to be known, 
the minimum mean square error prediction is given by 


C(q _1 )y(k+p|k) = F p (q _1 )y(k) + Ep(q -1 )B(q -1 )u(k+p-d). 


(5.2.3) 

The prediction error is given by 

e(k+p | k) = y(k+p) - y(k+p|k) = Ep(q -1 )e(k+p). 

So the prediction error will have the f ollowing statistical 
properties: 


(i) Mean: £{e(k+p)) = 0. 

2 2 2 2 2 

(ii) Variance: £<e(k+p|k) > = (1 + e t + e 2 +...+ ep^c* , 

where 

£(e 2 (k+p) |y(k), y(k-l),...} = <r Z . 



138 Chapter 5 Prediction using TF Models 


Summarizing the p-step explicit prediction procedure: 

(1) Estimate the parameters of the model (5.2.1). 

(2) For each p, compute parameters of Ep(q j and F p (q _1 ) 
from the identity (5.2.2) (see Appendix 5A for a recur- 
sive algorithm and its implementation)^ 

(3) Compute the optimal prediction y(k+p|k) using the 
estimated parameters from (5.2.3). 

Remarks 

In most practical cases, the noise in (5.2.1) may be due to 
sources external to the process or due to other 
uncertainties in the model. The correct estimation of the 
parameters of C(q _1 ) is difficult unless the disturbances 
are frequent and have well defined statistical properties, 
which is rare in jjractice^ Hence Clq - *) may be assumed to be 
1, implying A(q - J/C(ci ) and B(qj/C(qj being absorbed 
into A(q” T ) and B(q~ r ) respectively. Alternatively, a fixed 
noise observer polynomial T(q j may be used to represent 
prior knowledge about the noise process, as explained in 
Sec.2.4.2. A typical choice for T(q _1 ) is a first order 
polynomial, say T(q _1 ) = l-0.7q~\ T(q *) replaces C(q *) in 
the identity (5.2.2), and the predictor (5.2.3) becomes 

y(k+p|k) = Fp(q _1 )y f (k) + Eplq'^BIq'^u^k+p-d), 

where y f (k) * y(k)/T(q _1 ), and u f (k) = u(k)/T(q -1 ). 

ARIMAX model based design 

Consider the process model 

A(q _1 )y(k) = B(q _1 )u(k-d) + C(q _1 )e(k)/A, A = 1-q" 1 . 

Introduce the identity 

C(q -1 ) = Ep(q -1 )AA(q _1 ) + q" p F p (q _1 ). 

Following the same procedure as for ARMAX model, the optimal 
p-step ahead prediction is given by 

C(q l )y(k+p|k) = F p (q _1 )y(k) + EpIq’^Blq'^A^k+p-d). 

The prediction error is given by 
e(k+p|k) = Epe(k+p). 
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5.2.2 Implicit (Direct) Prediction 

ARMAX model based design 

Consider the process model (5.2.1): 

A(q’*)y(k) = B(q _1 )u(k-d) + C(q _1 )e(k). 

Following (5.2.3), and dropping the index (q _1 ) for clarity, 
y(k | k— p) = (l-C)y(k | k-p) + Fy(k-p) + EpBu(k-d) 

= 0 T (k-p)0(k-p), (5.2.4) 

where 

$ T (k-p) = [-y(k-l|k-p-l),...,-y(k-n|k-p-n), 

y(k-p),...,y(k-p-n+l), u(k-d) u(k-d-p-n+l)], 

(5.2.5) 

0 being the associated parameter vector. The parameters of 
the predictor can be estimated from 

$(k) = S(k-l) + k(k)e(k), (5.2.6) 

where 

e(k) = y(k) - y(k|k-p) 

= y(k) - * T (k-p$(k-l), (5.2.7) 

and k(k) is the Kalman gain of the estimator (as discussed 
in Sec.3.4.1). 

Note that y(k|k-p) in (5.2.7) is computed using poste- 
rior parameter values &(k-l), and y(k— 1 1 k— p— 1) etc. in 
(5.2.5) can also be computed in the same way. 

Summarizing the prediction procedure: 

(i) Estimate predictor parameters 0(k) using (5.2.6). 

(ii) Compute the p-step ahead prediction 

y(k+p|k) = 0 T (k)$(k). (5.2.8) 

A comparative study of the explicit and the implicit methods 
of prediction is presented in Sec. 5. 3. 4. 

Self -tuning property 

In both the explicit and the implicit approaches, the 
prediction problem is solved by considering a separation 
between the estimation of the parameters used in the 
predictor and computation of the predicted values. The 
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Figure 5.2.2 8-min ahead implicit prediction of 
temperature with discontinuity of service. 


estimated parameters are used in the predictor, as if they 
were the true ones; this concept is known as certainty 
equivalence. Eventually, as the parameters converge to the 
true values, the predictor will also converge to the optimal 
minimum mean square error predictor; this is the self -tuning 
property of the predictor. The self-tuning property is 
further discussed in Sec. 12. 2.1. 


Example 5.2.2 Prediction of waste gas temperature in the 
sintering process 

A detailed description of the sintering process is given in 
Sec.5.6.1. The waste gas (WG) temperature being primarily 
related to the strand speed, the prediction problem concerns 
a one input (i.e. strand speed, u) and one output (i.e. WG 
temperature, y) process. The u and y data are available 
every 2-minutes. Consider using the implicit predictor 
(5.2.8). 

Firstly, the time delay between a change in u and the 
consequent response in y is determined using historical data 
as 8 minutes (so d = 4). The predictor (5.2.4) is assumed to 
have two E(q -1 ) and two F(q; parameters. The mean and 
variance extracted data are used to estimate the parameters 
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in (5.2.4) using the recursive least squares method. The 
4-step ahead (i.e. 8 min ahead) prediction is produced; as 
shown in Fig. 5. 2. 2. 

In the present case, there was a brief discontinuity of 
service when the strand stopped moving towards the end of 
the 10th hour. As the speed signal (u) dropped to an 
abnormally low value, the parameters were frozen and the 
parameter estimation was suspended. The updating of the 
measurement vector continued. The predicted value of WG 
temperature was assumed to be the current value. After the 
operation was restored, the steady parameter values before 
stoppage were used for prediction until the newly estimated 
parameters resettled. 


5.3 CONSTRAINED MEAN SQUARE ERROR PREDICTION 

One of the ways of enriching the prediction algorithm with 
subjective knowledge is to use constrained minimum mean 
square error cost function, where in addition to the predic- 
tion error, the prediction increments are also costed. The 
consequent predictions are expected to be robust and 
meaningful. 


5.3.1 Why Constrain Prediction 

There are inherent constraints on the perf ormance of most 
real-life processes. For example, the maximum or minimum 
values that the output can reach can be bounded, or say the 
maximum rate at which a variable can change with time may be 
limited etc. The prediction will be meaningful if such 
constraints of the process are not disregarded. Consider the 
following examples. 

(a) If the process is inherently rate limited, there is 
sense in constraining the sequential increments in multistep 
prediction. For example, the silicon content in molten iron 
can change only within certain limits during a tapping 
operation of the blast furnace. 

(b) Some processes show strong periodical links, which may 
justify constrained prediction. For example, in the case of 
power-load forecasting, the load during the following few 
hours of a particular day may not be very different from the 
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load at the same time in the previous week; hence in 
computing the hourly predictions, it would be wise to 
penalize the diff erence of the load with that at the same 
time in the previous week. Again, additional constraints on 
sequential increments of the predicted load demand may also 
be imposed. 

If the process is deterministic and the model is 
representative, the constraints inherent with the process 
may be built into the model. However for most cases, the 
model itself may not adequately incorporate the limitations 
of the process. 

The minimum mean square error predictor expects all the 
characteristics of the process to be incorporated in the 
model. In addition to the process being correctly modelled, 
the measurements have to be noise free in order that 
sensible prediction can be produced. The additional costing 
on the prediction increments can protect a predictor from 
the influence of the noise associated with the data; at the 
same time, the user can incorporate specific subjective 
knowledge about the process in the predictor through the 
proper choice of the prediction increments to be costed, and 
thereby improve the quality of prediction. 


5.3.2 Cost Criteria 

In Fig.5.3.1(a), let a t , a 2 , a 3 and a 4 be the 1, 2, 3 and 4 
-step ahead prediction respectively produced at time T-l, 
and b 1 , b 2 , b 3 and b 4 be similar predictions produced at 

time T. Here T, T-l,... etc. are the prediction intervals, 
and k, k+1,... etc. are the sampling intervals or the time 

steps over which predictions are performed. In the case of a 
periodic process (Fig.5.3.1(b)), T, T-l, may relate to 
consecutive periods where a lf a 2 , a 3 and a 4 are the 
corresponding actual outputs pertaining to the last period 
T-l. In both the cases, the predictions, b lP b 2 , b 3 and b 4 

may be evaluated at time T with constraint imposed on the 

prediction increments as follows: 

(a) J cl costing where delayed increments between aj and b Jf 
a 2 and b 2 , a 3 and b 3 , and a 4 and b 4 etc. are penalized. 

For example, in the case of electric power load on a 
substation (see Example 7.5.2) a lf a 2 , a 3 and a 4 may be the 
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(b) 


Figure 5.3.1 (a) Typical multistep prediction of a time 
series (b) Typical multi-step prediction for a periodic 
process. 


hourly load measured at the same time during a previous 
week, where b t , b 2 , b 3 and b 4 are the hourly predictions 
being produced in the present week. Since the power 
consumption for a particular day of the week is expected to 
follow a certain pattern, it is wise to cost the differences 
between the hourly predictions of consecutive weeks. 

(b) J c2 costing where sequential increments between b c and 
b t , b t and b 2 , b 2 and b 3 , b 3 and b 4 etc. are penalized, b 0 
being the present measurement; 

For example, in the prediction of the temperature of a 
soaking pit (discussed in Sec.6.8) the change in the 
temperature of the pit over a certain length of time (say 
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ten minutes) is expected to be within certain limits (say 
within ± 20° C), because of the high time-constant of the 
process. Hence the sequential increments may be costed 
accordingly. 

(c) J c3 costing where positional differences between a 2 and 
b t , a 3 and b 2 , a 4 and b 3 etc. erne penalized. 

For example, consider the case of the prediction of the 
quality of sinter (Sec.5.6), where hourly predictions are 
produced. In evaluating the prediction for a particular time 
instant in the future, it makes sense to take into account 
the prediction made at the preceding hour f or the same 
instant, since the process characteristics are not expected 
to change by a great deal in one hour. 

It is clear that the prediction increments that should be 
constrained and the time increments that should be 
considered are dependent on the nature of the process. 
Consider a generalized cost criterion 

J c = J» + Jinc (5.3.1) 

where by definition, 

J m = £{(e(k+p)) 2 ), e(t+p) * y(k+p) - y(k+p|k), 

and J lnc is the cumulative costing on (say n) different 
types of prediction increments: 

J lnc = £((e lnc ) 2 } = £{ I/MCi) 2 ). (5.3.2) 

1 = 1 

is a scalar constant. Thus through Ji„ c , the cost 
function J c permits simultaneous penalization of different 
types of prediction increments e lnc . 

If only delayed prediction increments for a periodic 
type process are to be penalized (as discussed in (a) 
above), the cost becomes 

J cl = £<(e(k+p)) 2 + A !e 2 ), 

where 

= y(k+p|k) - y(k+p|T-l), (5.3.3) 

y(k+p|k) being the same as y(k+p|T). 

If only sequential prediction increments are to be 
penalized as discussed in (b) above, the cost becomes 
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J c2 = £{(e(k+p)) 2 + X 2 e 2 >, 
where 

e 2 = y(k+p|k) - y(k+p-l|k). (5.3.4) 

If only positional differences in prediction are to be 
costed, as discussed in (c) above, the cost becomes 

J c3 = E{(e(k+p)) 2 + X 3 e 3 >, 
where 

e 3 * y(k+p|k) - y(k+p|k-l). (5.3.5) 

Thus e lnc in (5.3.2) can be appropriately defined by the 
designer, who can also consider a combination of different 
types of prediction increments which will additively 
constitute e inc . 

Minimization of the cost (5.3.1) leads to 

= - 2(y(k+p) - y(k+p|k)) + 2 EX^Ci) 

dy(k+p|k) 1=1 

= 0 ; 
n 

e(k+p) = £ X^). (5.3.6) 

i=l 

Remarks 

The different cost criteria offer different features and are 
applicable to different types of processes; hence a general 
comparison is not possible. However, the following features 
may be noted. In the case of the costing J cl and J c3 , the 
prediction inaccuracy due to a sudden disturbance would 
propagate through the prediction intervals (for example T-l 
to T), whereas in the case of the costing J c2 , where 
sequential increments are costed, the effect of a similar 
disturbance would be rejected in one prediction interval. In 
all the three cases the effect of disturbance would be 
reduced. 


5.3.3 Prediction Formulations 

Explicit approach 
Consider the ARIMAX model 

A(q _1 )y(k) = B(q _1 )u(k-d) + C(q _1 )e(k)/A, 


(5.3.7) 
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where C(q _1 ) is a stable polynomial; it is assumed that the 
known input u, the output y and the noise are mean square 
bounded. 

Since the correct estimation of the parameters of 
C(q *) is difficult, let a (known) noise observer polynomial 
T(q -1 ) be used. Define the identity 

T(q -1 ) - Ep(q" 1 )AA(q~ 1 ) + q' P F p (q _I ), (5.3.8) 

where 

Ep(q -1 ) = 1 + e^ -1 + ... + e p _ 1 q" p+1 , and 

F p (q -1 ) = f 0 + fjq _1 + ... + f n q' n ; 

the degree of F p , SEp = p-1, and SF p <5(AA). 

From (5.3.7) and (5.3.8), omitting the symbol (q ) for 
simplicity, 

y(k+p) = F p y(k)/T + EpBAu(k+p-d)/T + Epe(k+p); (5.3.9) 

since Ep is of degree p-1, Ej,e(k+p) is independent of the 
rest of the terms. Hence the constrained minimum mean 
square error prediction is given by 

y(k+p | k) + e(k+p) 

= F p y(k)/T + EpBAu(k+p-d)/T (5.3.10) 

where c(k+p) is given by (5.3.6). 

Summarizing the constrained explicit prediction procedure: 

(1) Estimate the process parameters in 

A(q -1 )Ay f (k) = B(q -1 )Au f (k-d) + e(k), (y f = y/T) 

using the recursive least squares (RLS) method. 

(2) For each value of p, determine the parameters Ep(q~ ) 
and Fp(q 1 ) from the identity (5.3.8). 

(3) Define the constraint on the prediction increments 
(5.3.6). 

(4) Compute the p-step prediction y(k+p|k) using (5.3.10). 

Implicit approach 

The parameters of the predictor in (5.3.9) can be directly 
estimated from 

y(k) = F p y(k-p)/T + GpAu(k-d)/T + Epe(k), (5.3.11) 

where G p = EpB, SEp = p-1, 8G p = 5Ep+SB = n+p-1, 6F p = 5A. 
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Note that the data in (5.3.11) are not properly 
balanced, as the output data are positional or actual, 
whereas the input data are incremental; this is likely to 
aff ect the quality of estimation. Hence introduce the 
identity 

F p (q -1 ) = T(q _1 ) + AF p '(q _1 ). (5.3.12) 

So the estimator (5.3.11) may now be expressed as 

y(k) - y(k-p) = F p 'Ay f (k-p) + G p Au f (k-d) + Epe(k), 

(5.3.13) 

where y f = y/T(q -1 ) and u f = u/T(q _1 ) and the degree of F p ' , 
SF p '= SFp-1. 

When the prediction horizon p is greater than the time 
delay d, the noise term in (5.3.13) is no longer 
uncorrelated with the data; this can be avoided in order to 
use the RLS estimation method, by modifying the time 
indexing as follows: 

y(k+p-N) - y(k-N) 

= F p ' Ay f (k-N) + G p Au f (k+p-d-N) + Epe(k+p-N), 

(5.3.14) 

where N is the maximum length of the prediction horizon. 

The p-step ahead implicit predictor is given by 

y(k+p|k) + e(k+p) 

= y(k) + F p ' Ay f (k) + G p Au f (k-d), (5.3.15) 

where e(k+p) is given by (5.3.6). 


Example 5.3.3 Explicit prediction with J cl costing 

A simulated process, controlled by an LQG controller 
(discussed in Sec. 13. 6) is considered (note that the 
prediction procedures are independent of the control policy, 
if any, in use). The future values for the control input are 
assumed to remain unchanged at the latest known value. An 
RLS estimator with a forgetting factor of 0.99 is used and 
no prior knowledge of plant parameters is assumed. 

Consider the model 

(l-1.7q _1 +0.7q" 2 )y(k) = (l-1.5q _1 )u(k-l) + (l-0.6q _1 )e(k), 

with RMS value of noise equal to 0.1. The process is also 
being acted on by step load disturbances in the output (e y ) 
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Figure 5.3.2 (a) Constrained explicit prediction with 
X, without T-filter. 

(b) Constrained explicit prediction with X and with 
T-filter. ( actual output, prediction). 
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as well as in the input (e u ). In this example, first an e y 

and an e„ of magnitude +3 are applied. Following this, X 2 
costing of magnitude 1 is introduced on the prediction 
increment, and the same disturbances e y and e u are re- 

applied in the reverse direction. Finally, the cost on the 
prediction-increment is withdrawn, and e y and e u of the 
same magnitude are re-applied. The exercise is repeated with 
an assumed noise observer filter, T(q" ) = (l-0.5q _1 )/0.5. 

The results are shown in Figs. 5.3. 2a and 5.3.2b. It can 
be seen that the costing \ 2 is very effective in reducing 

the effects of the disturbances e y and e u on the 
predictor. The results also show that disturbance rejection 
perf ormance of the predictor is much improved when noise 

observer polynomial T is used; this protects the estimator 
and also since the disturbances are filtered, they have 
milder effects on the predictor. 


5.3.4 Comparative Study 

The explicit or indirect method of prediction requires only 
one stage of parameter estimation, f ollowed by solution of 
the identity (5.3.8) for each value of prediction step p. 
Since this identity can easily be solved recursively 
(Appendix 5A), the computational load is not much. However 
the accuracy of prediction mainly depends on one-stage 

estimation. 

On the other hand, the implicit or direct method of 
prediction uses the same formulation for estimation and 
prediction, which can make the predictor well behaved but 
the computational expense is higher as it is necessary to 
run p number of estimators in parallel. One alternative to 

multistage estimation is to use the same covariance matrix 
for the part of the data vector in (5.3.15) which remains 
unchanged as the prediction horizon increases as explained 

below. 

In (5.3.14) the degrees 5F p ' (= 5A-2), and SG p 
(= n+p-1), i.e. the number of parameters in G p increases 
with p. If for every p>l, the 1st p parameters of G p are 
assumed to remain unchanged (at the values obtained from the 
estimation of the previous step, p— 1). The data vector on 

the right-hand side of (5.3.14) remains unchanged when 
arranged as in (5.3.16). So the N stage estimation problem 



150 Chapter 5 Prediction using TF Models 


is configured into a one-stage estimation (pertaining to 
p=l) and (N-l) stages of parameter reconstruction problem 
(pertaining to p&2): 

y(k+p-N) - y(k-N) - GjAu f (k+p-k-N) 

= F p ' Ay f (k-N) + G 2 Au f (k+p-k-N) + Epe(k+p-N), (5.3.16) 
where 

G 2 (q _1 ) = G p (q _1 ) - G^q" 1 ), p * 2, 

Gi(q _1 ) = g 0 + giq’ 1 + ••• + g P - 2 q" p+2 - 

A _J 

G lf the parameters of the first (p— 1) terms of G p (q ), 
are assumed to have been obtained from the earlier stages of 
estimation and only the parameters of Fp'fq -1 ) and G 2 (q _1 ) 
are determined. Note that the noise terms in (5.3.16) are 
uncorrelated with the data. 

The estimation procedure can be summarized as follows: 

(1) Estimate the parameters of (5.3.16) for p=l when G 1 =0. 

(2) For the next value of p, the 1st parameter of the last 
G 2 is fixed and goes to G t and the parameters (0) of 
F' and the new G 2 are determined from 

^p+i = ^p + k(l)x(Estimation error) 

where -only the estimation error changes from one step 
to the next; k(l) is the fixed Kalman estimator gain 
vector obtained for p = 1 from (1). 

(3) Repeat (2), until p = N. 


5.4 MULTISTEP PREDICTION THROUGH PROCESS MODEL 
RECURSION 

If the prediction horizon p is greater than the degree of 
the noise process, the prediction y(k+p|k) can be obtained 
by iterating the process model as explained below. 

For the sake of simplicity, without any loss of 
generality, consider the ARMA representation of the process: 

A(q _1 )y(k) = C(q -1 )e(k), (5.4.1) 

where A and C are stable polynomials of degree n and m 
respectively in the backward shift operator q . Hence 
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-» Predictions 


Figure 5.4.1 Schematic diagram of multistep predictor 
through process model recursions. 


y(k+i|k) + a 1 y(k+i-l|k)+...+ any(k+i-n|k) 

= e(k+i|k) + c t e(k+i— 1 j k) +...+ t^eik+i-mjk), (5.4.2) 

where y(k+i|k) = y(k+i) for i = 0,-1, ...etc., and e(k+i|k) = 
0 for i = 1, 2,... etc. For i a m+1, the right-hand-side of 
(5.4.2) vanishes, leading to 

y(k+i | k) = -a 1 y(k+i-l|k)-a 2 y(k+i-2|k)-...-a n y(k+i-n|k). 

(5.4.3) 

Hence, the i-step ahead prediction y(k+i|k) in (5.4.3) will 
be optimal, if 

(1) the parameters a t are true, 

(2) i>5C, and 

(3) y(k+i-l|k), y(k+i-2|k), ...» etc. are optimal. 

If an ARIMAX model is considered for the process, for 
>8C, the predictor is given by 

A(q _1 )y(k+p|k) = B(q -1 )Au(k+p-d), (5.4.4) 

where 

A = (1-q )A = 1 + a a q + a 2 q + ...+ an +1 q 

A(q -1 )y(k+p|k) = y(k+p|k) + aiyik+p-llk) + a 2 y(k+p-2|k) 

+ ... + a n+1 y(k+p-n-l|k). 

Summarizing the prediction procedure (Fig. 5.4.1): 

(1) Estimate the parameters of the process model (5.4.1). 

(2) Compute the predictions y(k+p|k) for p = 1 to SC, using 
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any suitable method; 5C is the degree of the noise 
process. 

(3) Compute the prediction y(k+p|k), for p>SC, by iterating 
the process model recursively as in (5.4.3) for ARMA or 
as in (5.4.4) for ARIMAX process. 


5.5 CASE STUDY: PREDICTION OF PRODUCT QUALITY 
IN IRON-ORE SINTERING PROCESS 

A multivariable predictor is used to predict product quality 
in an iron-ore sintering process. The problem is configured 
as a two input two output prediction problem. Since the data 
on the product quality are obtained from off-line measure- 
ments, real-time filtering is used before the data are used 
for prediction. 


5.5.1 Process Description and Prediction Problem 

Iron-ore sinter is a preprocessed feed material for blast 
furnaces. A schematic diagram of the sintering process is 
shown in Fig.5.5.1. Iron bearing fine materials are mixed 
with coke breeze, flux, water etc. to form the raw mix. The 
material is loaded onto a moving strand and is levelled to 
form a flat bed, the surface of which is ignited as it 
passes under an ignition hood. As the bed travels 
horizontally, a combustion zone is drawn downwards through 
the material under the influence of an exhaust fan, thereby 
driving off the volatiles and fusing the material to form 
sinter. Usually, the speed of the strand is adjusted such 
that by the time the processed material arrives at the end 
of the strand, the combustion zone reaches the bottom of the 
bed and thus allows just sufficient on-strand time. At the 
other end of the strand, the processed material is unloaded, 
crushed and screened. The sinter fines (typically <5 mm) are 
returned to the mixing station f or reprocessing and the 
sinter product is passed for use in the blast furnace. 

The physical strength and the degree of oxidation, 
indicated by the FeO content, are the two important measures 
of quality of the sinter product. These measurements being 
available from infrequent off-line analyses, their predic- 
tion can be useful for the plant operators. The variables 
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Figure 5.5.1 Schematic diagram of the Sintering process. 


which most strongly influence the physical strength and the 
FeO content of sinter are the coke rate and the on-strand 
sintering time. If the on-strand process is strictly under 
control, the on-strand sintering time can be directly calcu- 
lated from the strand speed; otherwise, both the strand 
speed and the waste gas (WG) temperature must be considered. 

The present study is based on data collected f rom a 
sintering plant of rated capacity of 13000 tons per day. 
Average values of the strand speed, waste gas temperature 
and coke rate measurements were available at 15-min 
intervals. A temperature/speed factor, an auxiliary variable 
is used which is empirically computed as 

Temperature/speed factor(k) 

_ speed(k) x temperature(k) + WG temperature(k-l/2) 

where one time step is considered to be 1 hour. The strength 
and FeO measurements were available alternately, every hour. 
The objective was to produce hourly predictions of strength 
and FeO based on the two inputs, coke rate and temperature/ 
speed factor. From the knowledge of the process, the time 
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delay associated with the coke rate was assumed to be 4 
hours and that for the temperature/speed factor was assumed 
to be 3 hours. 

Remarks : Measurements of FeO and sinter strength 

FeO content is determined by an analytical method. A 
representative sample of the sinter product is prepared and 
is tested typically using an X-ray diffractometer. 

The sinter strength is determined empirically. In a 
typical set up, 15 kg of representative sample is separated 
into a drum. The drum is rotated 200 times in 7.5 minutes, 
after which the product is treated in a vibrator for 3 
minutes. The resulting +6.3 mm percentage of the product is 
expressed as the ISO strength index. 


5.5.2 Data Preparation 

The strength and the FeO measurement data required low-pass 
filtering because of (i) inhomogeneity of the solid samples 
that are tested and (ii) the empirical nature of the 
strength measurement. The other data being averaged, no 
filtering was necessary. Unfortunately low-pass filtering 
tends to produce time lag which deteriorates the real-time- 
liness of the data. Hence in the present study, an extended 
first-order Butterworth filter (Appendix 14A) was used. When 
a new measurement is available, the predictor is used to 
compute a one-step ahead prediction, and this predicted 
value is entered into the filter instead of the actual value. 

The improvement in the performance may be explained as 
follows. A first-order Butterworth filter has a pole on the 
positive axis inside the unit disc, and a zero at -1. 
Ideally, the extended configuration introduces a zero at the 
origin in order to reduce the phase or time lag without 
altering the gain. In practice, due to the inaccuracy of 
one-step prediction, this zero tends to move towards the 
left of the origin and the reduction in time lag is less. 
Note that the bidirectional filtering, which is an off-line 
method, produces no lag, whereas the ordinary low-pass 
filtering produces significant lag. The present low-pass 
filtering method being a real-time approach is an ideal 
compromise. In the present case the ratio of (cut-off 
frequency )/(sampling frequency) is conservatively chosen as 
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0.15 such that the information content in the data is not 
affected. Real-time filtering is discussed in Sec.14.3.2. 


5.5.3 Prediction Exercise 
Problem formulation 

Using the implicit prediction policy, the predictor is 
expressed as 


yi<k)' 

rh (k) l 

e^k)' 

y 2 (k). 

= * Lo 2 «t)j * 

e 2 (k)_ 


where y t and y 2 represent the strength index and the FeO 
content respectively, ej are the uncorrelated equation 
errors, <j> and 0j are the data and predictor parameter 
vectors respectively. For one-step ahead prediction: 

#(k)= [y t (k— 1) y 2 (k— 1) yjik-2) y 2 (k-2) 

u^k-4) u 2 (k-3) u^k-5) u 2 (k-4) u^k-6) u 2 (k-5)] T , 

where u t and u 2 are coke rate and temperature/speed factor 
respectively. Thus the multivariable prediction may be 
solved as a multi-input/single output problem: 

(i) Parameter estimation: 

y t (k) = 0 T (k)0i(k) + e^k), i = 1, and 2; 

(ii) p-step ahead prediction: 

y t (k+p | k) = ^(k+pl&jtk). 

In the present application, the strength and the FeO 
measurements were available alternatively, every hour (i.e. 
measurements of strength or FeO are available at two-hour 
intervals). So when a new strength measurement is available, 
first the multivariable predictor is used to compute 
one-step ahead prediction then the extended low-pass filter- 
ing is performed, and the filtered value is used to evaluate 
multistep prediction. For the prediction of the sinter 
strength, the predictor uses information of the present and 
past coke rates, present and past temperature/speed factors, 
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Figure 5.5.2 (a) One to two-hour ahead prediction of 
FeO content of sinter product, 

(b) one to two-hour ahead prediction of sinter strength. 

present and past strength measurements, measurement and 
prediction of FeO content produced in the past hour. In the 
following hour, as a new FeO measurement is obtained, 
prediction of FeO content is computed the same way. 
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Results and discussions 

In the present application, the predictor parameter estima- 
tor is allowed to free-wheel through the hourly interpolated 
data sets between the two-hour intervals when the strength 
and FeO measurements are available. This procedure is found 
to produce better consistency of the parameters than when 
the estimator operates at two-hour intervals alone. The 
predictions of FeO and sinter strength are run in parallel, 
and the results of one are used in predicting the other. The 
results of one to two-hour predictions are presented in 
Fig.5.5.2(a) and Fig.5.5.2(b) (i.e. after every two hours, 

one-hourly and two-hourly predictions are produced). Thus in 
the present case, f our recursive least squares estimators 
are run in parallel. Note that although the process is 
complex, the predictions produced are quite close. It is 
found that the two-hourly predictions are marginally 
inferior, if the prediction is computed by recursion of the 
process model instead of using the optimal prediction 
procedure. 


5.6 CONCLUSIONS 

Sensible prediction demands caref ul design of both the 
estimator and the predictor. In the present context, 
measures such as use of incremental data in an ARIMAX 
framework, use of noise observer filter, provision ensuring 
sensible steady-state values for the estimator covariances 
etc. can help the process of estimation. It is argued that 
real-time prefiltering of the data with minimum loss of 
phase information or real-timeliness, can also improve the 
quality of data for estimation and prediction. 

It is emphasized that the predictor need not be a 
reformulation of the estimator, like the conventional 
minimum variance predictor. Improved prediction and better 
disturbance rejection is possible through constrained 
minimum variance strategy, where prediction increments or 
change of predicted levels over specific time intervals are 
constrained. This approach permits incorporation of 
subjective information about the process into the predictor, 
which can go a long way in improving the information content 
in the predicted values. 
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CHAPTER 6 


KALMAN FILTER AND STATE-SPACE APPROACHES 


Systems can be modelled using state variable 
representation. Kalman filter offers a method for 
state estimation and prediction. 


6.1 INTRODUCTION 

The system representations used so far have been based on 
input and output variables, which were assumed to be known 
quantities. Systems can also be modelled using state 
variables, which may or may not be measurable. Thus state- 
space representation permits incorporation of variables 
which can relate to internal behavior of systems that cannot 
be accessed or measured, along with the measurable external 
variables. Besides, the state-space approach also enables 
concise system representation which can be easily implemen- 
ted using computers. This chapter is devoted to the state- 
space approach to modelling of systems or processes, and 
estimation and prediction of state variables. 

The state-space model can be developed based on the 
input-output relationships or the prior knowledge of the 
physical laws governing the system. The modelling requires 
specification of the state variables, along with the input 
and output variables, and knowledge of the statistical 
nature of the uncertainties present. Once the model is 
developed, one of the main concerns is to produce estimation 
of the states. In state-space modelling, the term estimation 
is used in two different contexts, namely (i) estimation of 
the parameters (if unknown) of the state-space model, and 
(ii) estimation of the states, when the measurements are 
contaminated with additive noise. The parameter estimation 
problem has been discussed in detail in Chapter 3. A major 
part of this chapter is devoted to the problem of the 
estimation of the states using the Kalman filter. 

The Kalman filter (Kalman, 1960) can produce optimal 
estimation of the states, when the additive measurement 
noise is Gaussian and independent of the measurements. One 
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of the strongest features of Kalman’s algorithm is that 
recursive implementation is possible. This added to the fact 
that state-space approach itself offers many structural and 

computational advantages, makes estimation of states using 
the Kalman filter an attractive proposition. 

This chapter starts with the basics of state-space 
representation in Sec. 6. 2. The methods of state-space 
modelling are discussed next in Sec. 6. 3; both processes with 
or without noise are considered. Sec. 6. 4 presents modelling 
of periodic processes; state-space models are produced for 
the different structural components of periodic processes 
and prediction methods are developed. Sec. 6. 5 introduces the 
problem of optimal estimation of the states. Sec. 6. 6 is 
devoted to the study of the Kalman filter; the estimation 

algorithm is presented and its characteristic f eatures are 
discussed. The prediction of the states, discussed in 

Sec. 6.7, follows naturally from the state estimation. An 

application study relating to the estimation and prediction 
of the temperature in the soaking pit process at the 
finishing stages of steel-making is discussed in Sec. 6. 8; 
this study elaborates the issues involved in a complex 
industrial application, and the way the optimal estimation 
and prediction can be performed. 


6.2 STATE-SPACE REPRESENTATION 

A system is usually described by an external model expressed 
in terms of only the output of the system (for example, the 
time series model of the economic inflation of a country), 
or both the input and the output of the system (for example, 
the ARMA model of a furnace with fuel-gas flow into the 
furnace as input and the furnace temperature as output). The 
process variables of these models are directly measurable, 
and have physical connotations with respect to the system. 
However the external model provides no insight about the 
internal dynamics of the system. 

The internal model or a state-space model offers an 
alternative approach to system representation. These models 
are expressed in terms of the outputs, the inputs and the 
states of system (Fig.6.2.1). The states contain complete 
historical inf ormation about the system, although the states 
may not have any physical meaning and they may not be 
directly measurable. The state-space model provides a 
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Inputs 

(u t> u 2 u n ) 



States 

(x lt x 2 ,..., x n ) 


Outputs 

(yi» y2 y n ) 


Figure 6.2.1 State-space representation schematic. 


description of the internal and the external characteristics 
of the system. 

A fundamental property of a dynamical system is that it 
has a memory. In other words, behaviour of the system at any 
instant depends on the variables currently acting on it, as 
well as on the variables that had acted on it in the past, 
inf ormation on which remains stored in the states of the 
system. Thus only the present state, and the present and 
future inputs are required to predict the future behaviour 
of the system. This characteristic of the state-space models 
of isolating the future from the past by incorporating all 
the past information in the current states is called the 
Markov property (see Sec.2.2.1). A stochastic process is 
called a Markov process, if 

P{x(k+1) | x(k), x(k-l), x(k-2},..., x(0)> 

- P(x(k+l)|x(k)>, 

that is the conditional probability density function for 
x(k+l) depends only on its present value x(k) and not on any 
value in the past. Thus, the Markov property implies 
probabilistic causality. 

State equations can be formulated in different forms to 
represent different types of processes. Unlike the inputs 
and the outputs, the state variables are not unique and the 
designer is free to attribute appropriate definitions to the 
states and formulate the model accordingly. The state of a 
system is a vector, and the state-space models are described 
by matrix operators. 

Basic requirements for modelling 

(a) To form a state-space model at least three variables 
are necessary: the input (u or e), the output (y) and the 
state (x) variables. The dimension of the state vector is 
at least equal to the order of the system. 
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Figure 6.3.1 Structure of the process given by (6.3.1) 


(b) The relationship between the inputs, the outputs and 
the state variables, through the system parameters has to be 
defined, and it may not be time invariant. 

(c) The law for the transformation of the state vector from 
one time instant to the next should be stated; the initial 
value for the state x(0) should also be known. 

(d) The joint statistics of all variables should be known. 

Examples of state-space models follow in the next section. 


6.3 STATE EQUATIONS FROM DIFFERENCE EQUATION MODELS 

The two broad classes of processes are those with or without 
the measurement noise in the output. 


6.3.1 Processes without Measurement Noise 

Consider the difference equation model 

y(k) + a^fk-l) + ... + ajjlk-n) 

= b 0 u(k-l) + b^Ck-2) + ... + b n u(k-n-l), (6.3.1) 

where y is the output and u is the input. There may be many 
state-space realizations of (6.3.1), each comprising basic 
structural blocks like adders, scalars, delay or backward 
shift operators etc. A typical representation of (6.3.1) is 
shown in Fig. 6.3.1. 

Choosing the state variables x t as the outputs of the 
shift operators, the following state relations are obtained. 
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Let the state variables be defined as follows: 

Xj(k) = y(k), and since q^Xjtk+l) = x x (k), 

x 1 (k+l) = -ajyik) + b 0 u(k) + x 2 (k), 

x 2 (k+l) = -a 2 y(k ) + bju(k) + x 3 (k), 


Xn(k+1) = -a„y(k) + b^juik) + x n+1 (k), 

x n+i (k+l) = b n u(k). (6.3.2) 


Hence the state equation becomes: 


Xj^k+l) 


“Sj 1 0 ... 0 


Xjfk) 


[ b o 

x 2 (k+l ) 


-a 2 0 1 • • • 0 


x 2 (k) 

4. 

b i 

x„(k+l ) 


-Eh 0 0 1 


x;(k) 


bn-i 

Xn+lU+l) 


0 0 0 ... 0 


x n+1 (k) 


L b n 


(6.3.3) 

with the output equation expressed as 


y(k) = [10 ... 0] 


x^k) 

x 2 ( k > 


x; + i(k)J 


(6.3.4) 


The generic expressions for (6.3.3) and (6.3.4) are given by 
(Fig.6.3.2) 

x(k+l) ■ Ax(k) + bu(k), (6.3.5) 

y(k) = c T x(k) + Du(k), (6.3.6) 

where 



'~ a l 

1 

0 ... o' 


'bo ' 


~a 2 

0 

1 ... 0 


b i 

A = 

-a n 

0 

0 ..: 1 

. b = 

bn-! 


0 

0 

0 ... 0 


[ b n J 


c T = [1 0 ... 0 ], D = 0. 


(6.3.7) 


Equations (6.3.5 - 6.3.6) are called canonical form of state 
equations which is of special significance in system theory. 
Equations (6.3.3 - 6.3.4) are in observable canonical form. 

The effect of variation of the time delay between y and 
u may be considered as follows. 
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Figure 6.3.2 State-space diagram for the n-th order 
difference equation model (6.3.1). 


Case 1 Time delay - zero 

In (6.3.1), unity time-delay is assumed between the input 
u(k) and the output y(k). If the time delay is zero, that is 

y(k) + a 1 y(k-l) + ... + any(k-n) 

= b 0 u(k) + bjuik-l) + ... + b n u(k-n), 

the state-space representation (6.3.7) will be modified as 


x^k+1)' 


'-a t 1 0 ... 0‘ 


’ Xj(k) 


'bi -a a b 0 

x 2 (k+l) 


-a 2 0 1 ... 0 


x 2 (k) 


^>2 -a 2 b 0 

• 


-*n-l 0 0 1 


x;_i(k) 


b n-l -a n-l b o 

x n (k+l) 


-an 0 0 ... 0 


. X n (k) . 


, b n -a n b o . 


with the output equation expressed as 


y(k) - [1 0 ... 0] 


x t (k) 

x 2 (k) 


L*n+1 


(k) 


+ b Q u(k). 


(6.3.8a) 


(6.3.8b) 


Case 2 Time-delay = d 
Now the process model (6.3.1) becomes 
y(k) + a^lk-l) + ... + any(k-n) 

= b 0 u(k-d) + bju(k-d-l) + ... + b n u(k-d-n), 
and the state-space description changes to 
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x^k+1) 


x 2 (k+l) 

s 

Xn +d (k+l) 



-a x 1 0 
-ag 0 1 

-a n 

0 

0 0 


x t (k) 

x 2 (k) 


x n+d^^ 


nj 


u(k), 


and 


y(k) = [10 


0] 


x t (k) 
x 2 (k) 

x A + d( k )J 


(6.3.9) 


here d-1 leading elements of b are 0, and D = 0, with 
reference to the model (6.3.5 - 6.3.6). 

Remark: If the orders of difference polynomials in y and u 
in (6.3.1) are different, the dimension of the state 
descriptors are altered accordingly. 


Advantages 

Some general advantages of state-space modelling follow. 

(1) Internal modelling 

Since the internal physical dynamics can be modelled, better 
analysis and understanding of the overall input-output 
dynamics of the system are possible. Although some state 
variables may not have physical connotations, they may have 
analytical significance. 

The model shows which states can be directly measured 
(or observed), and which can be controlled; not all states 
are observable or controllable. The model may be used to 
estimate the states which cannot be observed. 

(2) Simplification of expression 

A state-space model can be a simplified expression for a 
relatively complicated process, e.g., an n-th order 
difference equation model (6.3.1) can be represented by n 
first order difference equations in the state-space format 
(6.3.2 - 6.3.4). Similarly an n-th order differential equa- 
tion cam also be expressed as n first-order differential 
equations through state-space description. 
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(3) Flexibility of modelling 

Since state variables are not unique, for the same 
input-output model, different state-space representations 
are possible. 

(4) Multivariable case 

The structured format of the state-space model is marginally 
altered when a multi-input multi-output process is 
considered instead of a single-input single-output process, 
which is computationally advantageous. 


6.3.2 Processes with Noise 

Equation (6.3.1) was a deterministic model in discrete time, 
where all variables were assumed to be known. In practice, 
the process may be subject to unknown disturbances acting as 
random input to the process or as noise, corrupting the 
output measurements; the former is referred to as the 
process noise and the latter as the measurement noise. The 
properties of the noise are discussed in Sec. 2.3.1. 

Consider the process model 

y(k) + a 1 y(k-l) + ... + a„y(k-n) 

= b c u(k-l) + b x u(k-2) + ... + b n u(k-n-l) 

+ e(k) + c x e(k-l) + ... + c n e(k-n), (6.3.10a) 

where u and y are the input and the output respectively as 
in (6.3.1) with e as the additional noise input to the 
system. The model (6.3.10a) can be concisely expressed 
as 

A(q *)y(k) = B(q _1 )u(k-d) + C(q _1 )e(k), (6.3.10b) 

where d = 1 and 

A(q _1 ) = 1 + a x (q -1 ) + a 2 (q _1 ) + ... + ajq" 1 ), 

B(q -1 ) = b 0 + b x (q _1 ) + b 2 (q _1 ) + ... + b n (q _1 ), 

C(q _1 ) = 1 + c 1 (q‘ 1 ) + c 2 (q _1 ) + ... + c n (q* ), 

The state-space model is given by 
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a! 1 0 ... 0] Ix^k) [b 0 [ci-aj’ 

a 2 0 1 ... 0 x 2 (k) b t c 2 -a 2 

I '. ! + .’ u(k)+ ! e(k), 

a n 0 0 ..: 1 x;(k) b;_! 0 ,,-a,, 

0 0 0 ... Oj [x n+1 (k)J [b n 0 

(6.3.11a) 

’x^k) 

y(k) = [10 ... 0] x ? (k) + e(k). (6.3.11b) 

x; +1 (k) 

If the output measurements (or observations) are 
contaminated with additive noise (v), the measured output 
y can be expressed as 

y*(k) = y (k) + v(k). (6.3.12) 

The overall state-space and the observation model cam be 
expressed as (Fig.6.3.3): 

x(k+l) = Ax(k) + bu(k) + se(k), (6.3.13) 

y (k) = c T x(k) + e(k) + v(k); (6.3.14) 

A, b, s and c correspond to (6.3.11). 


x^k+1) 

x 2 (k+l) 

Xn+i (k+1) 






6.3 State Equations from Different Equations 169 


Remarks 

(1) Note that any variable affecting the output directly 

(i.e. without any time delay) has to appear in the 

measurement equation (e.g., (6.3.14)) of the state-space 

model. 

(2) If the process is completely deterministic, the noise 

terms in the process model and the state-space model will 
disappear. If the noise present is unmeasurable, its 

estimated values may be used: 

e(k) = y(k) - h T x(k|k-l), 

where y(k) is the measured output and x(k|k-l) is the 
one-step ahead prediction of the state, computed at time 
(k— 1) (further discussions follow in Sec. 6.6). 

(3) The number of states required for minimal state-space 

representation (6,3.11) for an ARMAX process is given by 

max|5A, 8B+d, 80, where SA, 5B and 5C are the degrees of 
A(q _1 ), B(qj and C(qj respectively, and d is the time 

delay between exogenous input u and the output y. 

Example 6.3.2(1) State-space model for a stage in the 
paper-making process 

The basis weight (y) in the paper-making process, can be 
controlled by manipulating the thick stock flow (u); a 
typical model for the closed-loop controlled process is 
given by (Astrom and Wittenmark, 1973) 

(l-q _1 )(l - 1.283q -1 + 0.425q~ Z )y(k) 

= ( 1-q -1 ) (2. 307q -1 - 2.025q" 2 )u(k-2) 

+ 0.382(1 - 1.438q" X + 0.550q' Z )e(k); (6.3.15) 

The state-space model f or this process can be produced as 
f ollows. 

Rewriting (6.3.15), 

(1 - 2.283q _1 + 1.708q" Z - 0.425q' 3 )y(k) 

= (2.307q _1 - 4.332q' Z + 2.025q" 3 )u(k-2) 

+ (1 - 1.438q _1 + 0.550q“ Z )e' (k), 

where e' (k) = 0.382e(k). So the process (6.3.15) may be 
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expressed as 


XjOc+Ul T 2.283 1 0 0 Ol f Xl (k)] 0 0.845' 

x 2 (k+l) -1.708 0 1 0 0 x 2 (k) 0 -1.158 

x 3 (k+l) = 0.425 0 0 1 0 x 3 (k) + 2.307 u(k)+ 0.425 e'(k), 

x 4 (k+l) 0 0 0 0 1 x 4 (k) -4.332 0 

x 5 (k+l)J 0 0 0 0 0 x 5 (k) 2.025 0 


Xjlk)' 

x 2 (k) 

y(k) = [1 0 0 0 0] x 3 (k) + e'(k). 

x 4 (k) 

x s (k) 

Example 6.3.2(2) State-space model for the yearly averaged 
sunspot series 

The subset-AR model for this series (Appendix 8A) is given in 
Example 3.6.3(1) as 

y(k) = 1.2495y(k-l) - 0.551y(k-2) + 0.15y(k-9) + e(k). 
The state-space model for (6.3.15) can be expressed as 


x^k+l)' 
x 2 (k+l) 
x 3 (k+l) 
x 4 (k+l) 
x 5 (k+l) 
x 6 (k+l) 
x 7 (k+l) 
x 8 (k+l) 
x 9 (k+l) 

Xjfk)" 

y(k) = [1 0000000 0]! + e(k). 

x 9 (k) 

Remark : The state vector in case of the full AR model is 
of the same size as the one for the subset AR model. 


1.2495 1 0 . 

-0.5510 0 1 . 

0 
0 
0 

o 0 
0 

0.1500 


x^k)' 

x 2 (k) 

x 3 (k) 

x 4 (k) 

x 5 (k) 

x 6 (k) 

x 7 (k) 

x 8 (k) 

x 9 (k) 


1.2495 

-0.5510 

0 

0 

0 

0 

0 

0 

0.1500 
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6.4 STATE-SPACE MODELS FOR PERIODIC PROCESSES 

A process or a time-series may have characteristic compo- 
nents like the trend, the periodic or seasonal component 
etc., which may be implicitly or explicitly modelled. 

In the implicit approach, the trend and the seasonal 
component erne absorbed along with the signal and the noise 
components into a general purpose transfer-function model 
through the appropriate differencing of the data as in the 
case of the Box and Jenkins models (Sec. 4. 3). This model cam 
be expressed in state-space format as discussed in Sec. 6. 3. 

In the explicit approach, a structural model (discussed 
in Sec. 2. 6.1) of the time series is presumed to be 

y(k) = y tr (k) + y p (k) + Tj(k), (6.4.1) 

where y tr amd y p are the trend and periodic components 
respectively and i) is the model uncertainty. It is assumed 
that the trend and the periodic component(s) are separately 
available; the composite series is modelled by modelling its 
trend and periodic components individually. 

In this section explicit modelling has been considered. 


6.4.1 The Trend Model 


Consider a linear model for the trend: 

Ay tr (k+1) = p(k) + w 1 (k+l), A = l-q"\ 

that is 

y tr (k+l) = y tr (k) + p(k) + w^k+1), (6.4.2) 

where B represents the instamtaneous change of level (i.e. 
the slope); it is assumed that p can be modelled as 

P(k+1) = P(k) + w 2 (k), (6.4.3) 

where (w 1 (k)> and (w 2 (k)> are zero-mean, white noise 
sequences with known statistics: 


E[w tr (i)w,J(j)] 


(Q(i), i = j, 
[0, i * j, 


w tr (k) = [Wi(k) w 2 (k)J T . 
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The state-space representation for the trend component 
follows as 

x tr (k+l)= A tr x tr (k) + S tr w tr (k), (6.4.4) 

y tr (k) = Ct r x tr (k) + w 1 (k), (6.4.5) 

where 


x tr (k.l) = K, (kM) l, 

Lx t r 2 (k+l)J 

A tr = [ 11 12 ]» a n = a 12 = a 22 ■ 1. 

[P a 22J 

S tr = [q i]. C tr = [1 Of. 

So the states can be defined as 
Xtrfk) = y tr (k) - w^k), 

Xtrfk) = P(k). 

Example 

The German unemployment series (Appendix 7E) has a trend 
component and a periodic component (see Fig. 14.3.1). 
Assuming the trend component to be separately available 
(say, through bidirectional filtering, discussed in 
Sec.14.3.1), the state-space model (6.4.4 - 6.4.5) may be 
used to represent the trend component. 

The model is initialized with 

x trf°) = ytr(°)» and 

x tr 2 (0) = 0(0) =y tr (l) - ytrfO)- 

As the time index k increments, a new value for y tr (k) is 
^vailable; based on the model (6.4.4 - 6.4.5), the estimate 
y tr (k|k) may be produced using the Kalman filter discussed 
in Sec.6.6.2. 
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6.4.2 Periodic Component Model 

There are different ways of modelling the periodic component. 

(a) Basic model 
Consider the model 

£ y p (k-i) = w p (k), (6.4.6) 

1=0 

where N specifies the length of the period, and (w p (k)) is 
the zero-mean white noise sequence. Here it is assumed that 
the expectation of the periodic sums of the components y p to 
be zero. The state-space representation is given' by 

x p (k+l) ■ A p x p (k) + g p w p (k), 

y p (k) = c p x p (k) + w p (k), 
where 



(b) Periodic random walk model 

The random walk process has been discussed in Sec. 2. 3. 2. 
Periodic differencing may be used to model the periodic 
component as 

y p (k) - y p (k-N) = w p (k), (6.4.7) 

where N is the period length, and w p is the zero mean white 
noise. The state-space representation follows as 
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x p (k+l) = ApXp(k) + SpWp(k), 

y p (k) = CpXp(k) + w p (k), 
where 



Example 

Consider formation of the state-space model for (6.4.7) with 
N - 3. 

Here, 

0 0 l] [o' 

x p (k+l) =10 0 x p (k) + 1 w p (k), 

L° 1 °J L°J 

y p (k) = x pl (k) + w p (k). 

The validity of the model can be shown as follows. 
x pl (k+l) = x p3 (k), 
x p2 (k+l) = x pl (k) + w p (k), 
x p3 (k+l) = x p2 (k). 

So 

x pl (k+l) = x p2 (k-l), 

= x pl (k-2) + w p (k-2); 

hence 

y p (k+l) = y p (k-2) + w p (k+l), 
which is the same as the model (6.4.7). 
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(c) Periodic model based on trigonometric functions 

The trigonometric Fourier series representation of a 
periodic signal is discussed in Sec.2.5.1. and Sec.2.5.3. 

If the periodic sequence {y p (k)> has a period-length N, 

M 

y p (k) = a 0 + T [a,.cos(nuk) + b n sin(nwk)J, (6.4.8) 

n=l 

where if N is odd, M = (N-l)/2, and if N is even M = N/2; 
for n = N/2, 

cos(nwk) = (-1) , and sin(n&>k) = 0. 

Let N be even (see Remarks for the case when N is odd). The 
coefficients ^ and b n are given by 

®o = J E y P (k), 

N k=l 
2 M 

a„ = - E y p (k)cos(n«k), 

” k=l 

b n = \ £ y p (k)sin(nuk), (6.4.9) 

N k=l 

where n = 1, 2 ~1. For n = ~, 

%/2 = Z 1 y P (k)(-l) k . (6.4.10) 

N k=l 

The objective is to estimate a„ and b n defined as the 
states, while the orthogonal basis functions given by the 
sine and cosine terms are known. The recursive formulation 
for the coefficients, a„ and b n , is developed as follows. 

Define 


y(k) = a D (i) + £ (anQJcosnwfk-i) + b n (i)sinnw(k-i)] 

n=l 

+ a N/2 (i)(-l) <k ' 1) , (6.4.11) 

i being an arbitrary time index; equation (6.4.11) is valid 
because the set of time functions, 

<1 cosw(k-i) sinw(k-i) cos2w(k-i) sin2w(k-i) (-l) <k ’ l) >, 

is a set of orthogonal basis functions for any series which 
is periodic with period N. Following (6.4.11), y(k) can be 
redefined as 
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y(k) = a Q (i+l) + £ [anU+Dcosnufk-i-D + b n (i+l)sinnu(k-i-l)] 
n= 1 

+ a N/2 (i+l)(-l) (k " 1 " 1) . 

(6.4.12) 

Now, 

cosnu[(k-i)-l] = cosnu(k-i)cosnu + sinnu(k-i)sinnu, 

sinnu[(k-i)-l] = sinnu(k-i)cosnu - cosnu(k-i)sinnu. 

Hence equating the coefficients of cosnu(k-i) and sinnw(k-i) 
in (6.4.11) and (6.4.12), 

anU+Dcosnw - b n (i+l)sinnu = a^i), 

b n (i+l)cosnu + anfi+Usinnw = b n (i). 

Since 


cosnu -sinnu 
sinnu cosnu 


■ Un (say), 


(6.4.13) 


T T 

is an orthogonal matrix (i.e. U n U n = U n U n = I), 


anQ+l)' 
b n (i+l) 

Again, from (6.4.11) and (6.4.12), 
a Q (i+l) = a 0 (i), 

a M/2^ + ^ = (“ l) a H/2^^ 

Hence with the states defined as 

x(i) = [a 0 (i) a t (i) b^i) a 2 (i) b 2 (i)... a N/2 (i)]7 


cosnu sinnul (a^i) 


-sinnu cosnu | |b n (i)J ' 




x(i+l) = ApX(i), 
where 



(6.4.14) 


(6.4.15) 


(6.4.16) 

(6.4.17) 


(6.4.18) 


Again in (6.4.11), for i = k, the cosine terms become 
unity and the sine terms become zero, that is 
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y(k) = a„(k) + £ a^k) + a N/2 (k). (6.4.19) 

n=l 

The state-space representation for the seasonal component is 
therefore given by 

x(k+l) = ApX(k) + w p (k), (6.4.20) 

y(k) = Cpx(k) + Vp(k), (6.4.21) 

where following (6.4.16) and (6.4.19), 
c p = [1 1 0 1 0 ... 1 0 111 

v p and w p are measurement noise and the seasonal process 
noise respectively; A p is given by (6.4.18). 

Remarks 

(a) A p in (6.4.18) is an orthogonal matrix. 

(b) The first term a 0 (k) in (6.4.11) represents the average 
level; it can be eliminated if (y(k)> is zero-mean. 

(c) The period length N was assumed to be even here. If N 
is odd, the term a N/2 as in (6.4.10) will disappear; the 
rest of the formulation is similar. 

Example 6.4.2 Modelling the periodic component in the 
German unemployment series 

This is a series of monthly data with yearly periodicity 
(Appendix 7E), N = 12. It is assumed that trend and the 
periodic components of the series are separately available 
(see Sec.14.3.1). The modelling of the trend component is 
discussed in Sec. 6. 4.1; the periodic component is assumed to 
be available through bidirectional filtering as in Example 
14.3.1. The modelling starts from the year 1950. 

(i) Define the state vector (6.4.16) of length N 
x(k) = [a 0 3j bj a 2 b 2 a 2 b 2 a^ b^ a«j bg agl . 

(ii) The state-space model is given by (6.4.20 - 6.4.21). 

(iii) i A p is given by 




178 Chapter 6 Kalman Filter and State-space Approaches 


where 

U, 


cosiw siniu 
-siniw cosiw 


i = 1 to 5, u = 2n/12. 


(iv) c p is given by 

c p = [1 1 0 1 0 1 0 1 0 1 0 1) T 

(v) x(0) is given by (6.4.9) (n = 1 to 5, N = 12, w = Zn); 
the initial state is assumed to start from 1950. For the 
first period (k = 1) the values of initial states (xlO~ 5 ) 
work out to be 

a 0 =2.582, a 1 =0.549, a 2 =— 0.021, a 3 =0.007, a 4 =— 0.112, 
a5=0.046, a 6 =0.005, 

b 1 =3.842, b 2 =1.334, b 3 =0.523, b 4 =0.183, b s =-0.055 


(vi) Assume Q and R are the covariances of the process noise 
w p and the measurement noise v p respectively. The initial 
state-estimation error covariance P 0 is also assumed. A 
typical choice could be Q = 101, R = 1, P 0 = 1001. 

Having developed the state-space model, the Kalman filter 
(Sec.6.6.2) can be used to obtain the optimal estimates of 
the available observations <y(k)h 


Remarks 

(a) The process and measurement noise covariances Q and R 
can be estimated (Mehra, 1972), but the procedure is quite 
complex. So a trial and error approach may be applied, which 
however is a drawback. 

(b) A choice of P 0 with high diagonal values implies low 
confidence in the initial estimates of the states. 

(c) Steps (i) to (iv) are common to all time-series with 
monthly data and yearly periodicity. 


6.4.3 Prediction Problem Formulation 

For a series with additive structural components (6.4.1), 
the combined state-space representation is given by 

x(k+l) = Ax(k) + w(k), 

y(k) = c T x(k) + v(k), 
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where 


x(k) = 


x tr (k) 

L x p (k) 


A = 


Atr 0 
0 A PJ 


c = [c 


tr 


1 T 
C P J , 


w(k) = lw tr (k) w p (k)] T , v(k) = v tr (k) 


+ v p (k). 


All the components are specified by the trend model and the 
periodic process model discussed in Sec.6.4.1 and Sec.6.4.2 
respectively. The subsequent estimation of the states and 
multistep prediction will involve the following steps. 

(a) Assumption of the noise statistics: the mean and the 
covariance of w and v need to be assumed. Usually these 
noise processes are assumed to be zero-mean, Gaussian. 

(b) Initialization of the state estimates: this can be 
obtained from (6.4.9) and (6.4.10). 

(c) State estimation: use an optical estimator to 
compute recursively the state estimates x(k|k). The Kalman 
filter, discussed in Sec.6.6, may be used. 

(d) Multistep prediction: the p-step ahead prediction is 
given by 

y(k+p|k) = c T x(k+p|k), 
where 

x(k+p|k) = A p x(k|k), 
as discussed in Sec. 6. 7. 


6.5 OPTIMAL STATE ESTIMATION 

The states of a system can be estimated f rom available 
measurements which may be contaminated with noise. 

One of the ways of optimal estimation of the states is 
the Bayesian approach. The Bayesian method considers the 
variables to be estimated as random, and it requires 
knowledge of the prior probability distribution f unction of 
these variables (regarded as parameters). The Bayesian rule 
states that the conditional probability density function 
P(x|y) is given by 

Wx |y) = P(x)P(y | x) 

J P(x)P(y|x)dx 
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where the random variable x is observed through the noisy 
measurement y; P(x) is the prior probability density 
function of x before y is observed, and the likelihood 
function P(y|x) specifies the probabilities of y observa- 
tions as a f unction of a range of possible values of x 
(in the present context, x refers to the parameter values). 
The main problems in the realization of Bayesian estimator 
are: 

(a) the prior statistics of the variables to be estimated 
may not be available, whereas in the case of incorrect 
assumptions of the prior statistics P(x), the estimated 
parameters are likely to be biased. 

(b) this method requires evaluation of the posterior 

probability distribution function P(y|x), which earn be 
computationally intensive. 

A statistical alternative to the probabilistic approach is 
to consider the Gauss-Markov model: 

x(k+l) = Ax(k) + bu(k) + w(k), 

y(k) = Cx(k) + v(k), 

where the process noise w and the measurement noise v have 
Gaussian or normal distributions; in such cases the states 
can be estimated using the Kalman filter. 


6.6 THE KALMAN FILTER 

The Kalman filter (Kalman, 1960) is the best linear 
estimator which can produce an optimal estimate of the 
states of a linear dynamic system, subject to the 
disturbances having Gaussian distribution. The optimality is 
in minimum estimation-error-variance sense. 

In general, the function of a filter is to separate the 
signal or information from the noise-corrupted data. In the 
field of signal processing, filtering usually refers to 
frequency domain separation of signal from noise, e.g., 
low-pass, high-pass or band-pass filtering. The separation 
is possible only if the signal and the noise lie in 
different frequency ranges. When the signal and the noise 
occupy overlapping frequency bands, the Kalman filter can be 
used to compute the optimal estimate of the signal. 
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6.6.1 The Estimation Problem 


Consider the following difference equation representation of 
the system and its environment in a multi-input multi-output 
configuration 

x(k+l) = Ax(k) + Sw(k), (6.6.1) 

y(k) = Cx(k) + v(k), (6.6.2) 


where x is nxl state vector, y is the rxl vector of measured 
outputs, w is the pxl vector of random inputs to the system 
(psn) and v is the rxl vector of additive measurement noise. 
The matrices A, S and C are nxn, nxp and rxn matrices 
respectively. Here, the deterministic input u is assumed to 
be zero, which is not a limitation. 

The objective is to obtain x(kjk), the minimum variance 
estimate based on the available inf ormation on the states 
x(k|k-l) and the measurements, y(0),...,y(k), subject to the 
following assumptions. 


(a) Noise statistics 

The process noise (w(k)> and the measurement noise (v(k)> 
are independent, zero-mean, white noise sequences with known 
statistics : 


E[w(k)] = 0 , for all k, (6.6.3) 

E[v(k)] = 0 , for all k, (6.6.4) 

E[w(k)w T (T)] = ^ (k) k “ (6.6.5) 

E[v(k)v X (r)] = |* (k) k “ and (6.6.6) 

E[w(k)v T (x) ] = 0 for all k, t. (6.6.7) 


Remark : The Kalman filter equations (Sec. 6. 6.2) can be 
proved to be valid even if the noise sequences are not 
uncorrelated and (6.6.7) is violated. 

(b) Initial States 

The initial state x(O) is a random variable with known 
statistics: 

mean: E[x(0)] = x 0 , and 

variance: E((x(0) - x 0 )(x(0) - x c ) T ] = P 0 . 


(6.6.8a) 



182 Chapter 6 Kalman Filter and State-space Approaches 


The initial state x(0) is assumed to be independent of the 
process noise (w(k)>: 

£[x(0)w T (k)] = 0 for all k s 0. (6.6.8b) 

(c) System parameters 

The parameters of A, S and C are assumed to be known. 

The Kalman filter produces the best estimate x(k|k) of x(k), 
minimizing the scalar cost function 

J = £[(x(k) - x(k|k)) T (x(k) - x(k|k))l. 


6.6.2 Kalman Filter Equations 

The state estimate and one-step prediction are given by the 
following equations. 

(i) Measurement update (or correction) equations: 

x(k|k) « x(k|k-l) + K(k)[y(k) - Cx(k|k-1)]. (6.6.9) 

(ii) Time update (or prediction) equations: 

x(k+l | k) = Ax(k|k). (6.6.10) 

The process and the measurement models (6.6.1-6.6.2) and the 
Kalman state estimator can be schematically expressed as 
shown in Fig. 6. 6.1. 

The weighting or gain matrix K in (6.6.9) is given by 

K(k) = P(k | k-l)C T [CP(k | k-l)C T + R(k)]' 1 , (6.6.11) 

where P, the covariance of the estimation error for the 
measurement update and the time update, is given by 

P(k | k) = [I - K(k)C) P(k|k-1), (6.6.12) 

P(k+l|k) = AP(k|k)A T + SQ(k)S T , (6.6.13) 

the initial conditions at k = k G being 

x(k 0 j k 0 -l) = x 0 , and P(k 0 jk 0 -1) = P 0 . 

Thus, following (6.6.9) to (6.6.13), with the progression of 
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v(k) 



State estimator 


Figure 6.6.1 Structural diagram of the process and the 
measurement model, coupled with the Kalman filter state 
estimator. 


k «= 0 


k « 1 

y(o) 

, 

yil) 

$(0|-1)= x„ | 

x(0|0) 

x(l|0) | 

> 

P(0|0) t ' U< 

> 

P<0|-1)« P 0 m-u - 

K(0) 

P(1|0) m-u ‘ P(l|l) 

K(l) 


k = 2 
y(2) 

x<2|l) | 

p(2|1) HIT 
K(2) 


m.u. measurement update, 
t.u. time update. 

Figure 6.6.2 Flow diagram of the sequential computa- 
tion of the state estimates, the covariance matrix and 
the Kalman (filter) gain. 


time, as new measurements y(k) become available, the 
estimation and the prediction of the states, and updating 
the corresponding error covariances and the Kalman gain can 
f ollow recursively through the two stages of measurement 
update and time update as shown in Fig. 6. 6. 2. 
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6.6.3 Properties and Salient Features 


(a) Innovations sequence 

The term Cx(kjk-l) in (6.6.9) is the one-stage predicted 
output y(kjk-l), and (y(k) - Cx(k|k-1)> is the one-stage 
prediction error sequence, also Referred to as the 
innovations sequence. K(k)[y(k) - Cx(k|k-l)J, the weighted 
innovation, acts as a correction^ to the predicted estimate 
x(k|k-l) to form the estimation x(k|k); the weighting matrix 
K is commonly referred to as the filter gain or the Kalman 
gain matrix. 

The innovation represents the additional inf ormation 
available to the filter in consequence to the new observa- 
tion y(k). For an optimal filter, the innovation sequence is 
a sequence of independent Gaussian random variable. 

The occurance of bad data first shows up in the 
innovation vector. The discrepancies in the innovations 
sequence may be analysed probabilistically and if bad data 
is identified, it can be discarded before it can damage the 
filter 


(b) Estimation error covariances 

In the course of Kalman filtering, the estimation error 
covariances (6.6.12 - 6.6.13) indicate the degree of 
inaccuracy of the estimates. The error covariances are 
dependent on the process parameters and the noise statistics 
but not on the measurements. So different state-space 
designs may be Kalman filtered to ascertain which model 
produces least estimation error covariances and hence 
produces best estimates. Estimation error covariances may 
also be used to determine if the uncertainty in the 
estimates can be improved by the introduction of additional 
states or additional sensors. 


(c) Sampling periodicity and data reconstruction 

The sampling interval need not be uniform for the operation 

of the Kalman filter. There are two possibilities: 

(i) Normal operation: even if the time intervals between 
subsequent measurements vary, the Kalman filter may be 
run as normal, i.e. iterating through the twin steps of 
measurement update and time update; the next measure- 
ment update is performed only when a new measurement is 
received, and the time update follows, and so on. 

(ii) Data reconstruction: until the next measurement 
is available the state estimation may keep cycling 



6.6 The Kalman Filter 185 


continuously only through the time update (6.6.10). Thus 
xd^ll^) = A(k 2 [ k 1 )x(k 1 1 kx), 
where 

= A k2 " kl ; 

for further details see (6.7.5). k t is the time when 
the last measurement was available. This is an interes- 
ting data reconstruction feature of the Kalman filter, 
which can be utilized f or multistep prediction 
(discussed in Sec.6.7), as well as for reconstruction 
of the missing data. 

(d) Other aspects 

(i) Since the Kalman filter is a linear filter, moments up 
to the second order (i.e. means and covariances) appear in 
the filter formulations (6.6.9 - 6.6.13). (ii) The appli- 
cation of the Kalman filter for data or state smoothing is 
discussed in Sec. 14. 2. 

Adaptive Kalman filtering 

In general, the adaptation may be with respect to unknown, 
slowly time-varying noise covariances, deterministic inputs, 
parameter values etc. Since Kalman filtering mainly concerns 
estimation of states in a noisy environment, adaptive Kalman 
filtering usually refers to adaptation with respect to noise 
statistics. 

When the state-space model is known, but the process 
noise and the measurement noise covariances Q and R in 
(6.6.5 - 6.6.6) are unknown, adaptive filtering cam be 

performed using the Bayesian, the maximum likelihood, the 
correlation or the covariance matching methods (Mehra, 
1972). The noise is assumed to be stationary. Adaptation 
with respect to deterministic but unknown inputs, or with 
respect to both unknown noise covariances and unknown 
deterministic inputs, is also possible; see for example 
Moghaddamjoo (1989). The problem of adaptation with respect 
to unknown parameters has been discussed in Chapter 3. 
Adaptive simultaneous estimation of parameters and the 
states cam be performed using extended Kalman filters, where 
the parameter vector is appended to the state vector, and 
the enlarged state vector is estimated (Jazwinski, 1970, 
Anderson and Moore, 1979). 
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6.6.4 Implementation Aspects 
Initialization 

For the implementation of the Kalman filter, the following 
should be known: 

(a) the parameters constituting the matrices A, S and C in 
the model (6.6.1 - 6.6.2). 

(b) the covariances of the noise Q and R in (6.6.5 - 6.6.6). 

(c) the initial estimates of the state x D , and the initial 
covariance of estimation error P 0 . 

If any of the above are unknown, they have to be assumed or 
estimated. The parameters of the model can be estimated, for 
example using the least squares method from the transfer- 
function model (say, AR or ARMA model) of the process using 
the available input-output data. The noise covariances Q and 
R can be computed from the available data and prior true or 
approximate knowledge of the states. They may also be 
estimated (see remarks on adaptive Kalman filtering, above). 
The initial value of the state x 0 may be directly known from 
the available measurements or may be assumed based on prior 
process knowledge. The estimation error covariance P 0 is 
indicative of the confidence on the initial state estimate, 
and hence can be appropriately chosen. Usually P 0 is 
considered to be a diagonal matrix. 

The Kalman filter is not sensitive to incorrect choice 
of noise covariances Q and R. However it is necessary that 
the model is representative for the Kalman filter to produce 
sensible estimates. 

Computation 

Instead of updating the symmetric matrices P in (6.6.12 - 
6.6.13), its square root factor M may be updated, where 

P = mm t . 

Some advantages of square-root filtering are as follows. 

(a) Due to computational errors, on updating, P may not 
work out to be non-negative definite, leading to erroneous 
results, whereas MM 1 always has to be non-negative definite, 
thus offering better numerical stability. 

(b) Numerical inaccuracy due to machine round off etc. 
pertaining to the square root f actor M will be less than 
that of P. 

There are different approaches to square root filtering 
(Bierman, 1977). One of the popular approaches is UD 
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factorization: 

p = udu t , 

where U is upper triangular, and D is a diagonal matrix. In 
this approach there is no need f or any square root 
operation. So instead of P, the U and D factors are 
sequentially updated. In the present case UD-measurement 
update (6.6.12) and UD-time update (6.6.13) are to be 
computed, as discussed in Appendix 3A and Appendix 13C 
respectively. 

Summary of sequential implementation: 

(1) Initialize the Kalman filter with the parameters of 
models (6.6.2 and 6.6.3), and Q, R, x 0 , P 0 . 

(2) Read in a new measurement vector y. 

(3) Perform measurement updates, (6.6.9) and (6.6.12), and 
compute x(k|k) and P(k|k). 

(4) Perform time updates (6.6.10) and (6.6.13), and compute 
x(k+l|k), P(k+l|k) and K(k+1). 

(5) Go to (2). 


6.7 OPTIMAL PREDICTION 

This section concerns optimal multistep prediction of the 
state variable x(k+p|k); if desired, the prediction of the 
output variables can be obtained from the predicted states 
following the measurement equation (6.6.2) as 

y(k+p|k) = Cx(k+p|k). 

The concept of prediction is inherent with the Kalman 
filter, and it has already been shown that one-step ahead 
prediction of the state, x(k+l|k) is generated by the Kalman 
filter algorithm (6.6.9 - 6.6.10). Multistep prediction 

follows as a natural extension. 

The objective is to produce p-step ahead estimate 
x(k+p|k) of x at time k, given all data up to time k, where 
p as 1. The prediction will be optimal in a minimum variance 
sense, subject to the minimization of the cost 

J = E {(x(k+p) - x(k+p|k)) T (x(k+p) - x(k+p|k))>. 

Following (6.6.2 - 6.6.3), consider the dynamic process and 
measurement models 
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x(k+l) = Ax(k) + Bu(k) + Sw(k), (6.7.1) 

y(k) = Cx(k) + v(k), (6.7.2) 

where u is a hxl deterministic input vector and B is a nxh 
matrix which is assumed to be known; the other assumptions 
stated in Sec. 6. 6.1 relating to the model (6.6.2 - 6.6.3) 
are assumed to be valid. 

Following (6.7.1), multistep prediction may be obtained as 
x(k+2) = Ax(k+1) + Bu(k+1) + Sw(k+1) 

= A Z x(k) + ABu(k) + Bu(k+1) + ASw(k) + Sw(k+1). 

Similarly, 

x(k+3) = Ax(k+2) + Bu(k+2) + Sw(k+2) 

= A 3 x(k) + A Z Bu(k) + ABu(k+l) + Bu(k+2) 

+ A Z Sw(k) + ASw(k+l) + Sw(k+2). 

Hence 

x(k+p) = A p x(k) + £ A m_i Bu(i) + £ A m ‘‘sw(i), (6.7.3) 

i=k 1 =k 

where m = k+p-1. Hence the predicted states based on the 
available measurements upto time k is given by 

x(k+p | k) = E(x(k+p)|y(0), y(l),...,y(k)> 

= A p £(x(k)|y(0), y(l),...,y(k)> 

+ £ A m_1 B E< u(i)|y(0), y(l),...,y(k)> 

i=k 

+ £ A m-1 S E(w(i)|y(0), y(l),..., y(k)>. 
l=k 

(6.7.4) 

Since the random vectors (w(k), w(k+l),..., w(k+p-l)> are 
independent of (y(O), y(l),..., y(k)>, the last term in 

(6.7.4) vanishes. The optimal p-step state predictor follows 
as: 

x(k+p j k) - A p x(k|k) + £ A m "Wi). (6.7.5a) 

l=k 

The optimal predictor can also be expressed as 

x(k+pjk) = A(k+p jk)x(k|k) +^£^A(m j i)Bu(i), (6.7.5b) 

where A(k+j|k) = A J . 
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The prediction error is given by, 
x(k+p|k) = x(k+p) - x(k+p|k) 

= A p lx(k) - x(k|k)] + £ A m_1 Sw(i), 

l=k 

that is 

x(k+p|k) = A p x(k|k) + £ A m ’*Sw(i). (6.7.6) 

l=k 

It can be shown (see Appendix 6) that the covariance matrix 
corresponding to the optimal p-step prediction is given by, 


P(k+p I k) = A p P(k I k)( A p ) T + £ A m_1 BQ(i)B T (A m " 1 ) T , 

1=k (6.7.7) 


where m = k+p-1. 


6.8 CASE STUDY: ESTIMATION AND PREDICTION OF INGOT 

TEMPERATURES AND HEATING IN SOAKING PITS 

A typical application of the Kalman filter in the steel- 
making industry is studied here. 

Prior to rolling, steel ingots have to be unif ormly 
heated to a specific temperature, which is often achieved 
through heating in furnaces, called soaking pits. The Kalman 
filter may be used to estimate the inaccessible ingot- 
surf ace and the ingot-centre temperatures, and thereby the 
time when the ingots will be available for rolling cam be 
predicted; this information is of vital importance for 
efficient plant management as well as for saving fuel. 


6.8.1 Process Description and Problem Statement 

Molten steel is poured into cast iron moulds to form 
ingots. As the mould cools naturally in air, the ingot 
solidifies inwardly from the surfaces, shrinking up to 
approximately 77. by volume. When sufficiently cooled, the 
train of cars carrying the moulds is moved to the stripping 
yard, where the ingots are freed from the moulds; the centre 
of the ingot may still be molten. The train of the hot 
ingots is then moved to the soaking pits. 

A soaking pit (Fig.6.8.1(a)) is a rectangular chamber 
furnace with an arrangement for loading ingots through 
the sliding top cover. Gaseous or liquid fuel is fired in 
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thermo- 

couples 


Fuel 



(b) 


Figure 6.8.1 (a) Schematic diagram of the ingot heating 
process in a soaking pit. 

(b) The pit temperature control system. 


the furnace and the ingots are heated largely by radiation 
from the products of combustion and the refractory pit 
walls. The pit temperature control schematic is shown in 
Fig. 6. 8. 1(b). Typically 5-20 hot or cold ingots, each 5 to 
20 tons by weight, may be thermally soaked in the pit at one 
time. The temperature of the cold incoming ingots will be 
the same as the ambient temperature, whereas temperature 
distribution for hot ingots from the centre to the surface 
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Figure 6.8.2 Typical patterns of the pit and the ingot 
temperatures and the fuel flow. 

may range between, 1200-1600°C to 800-1000°C. The ingots 
are heated and soaked for several hours (say, 4-8 hours) to 
bring them to the desired temperature (typically, 1280°C), 
uniform throughout the cross-section. The ingots are said to 
be soaked through when 

(a) complete solidification of the centre has taken place, 

(b) the difference between the surface temperature and 
the core temperature reaches within a specified value, 
and 

(c) the average temperature is greater than a given value. 

The heating in the soaking pit is carried out in two phases 
(see Fig.6.8.2). 

(i) Maximum flow phase : this is the the initial phase, when 
f uel is supplied at the maximum permissible rate to the 
furnace, until the furnace temperature reaches a preset 
value, 

(ii) In-control phase: during this phase, the temperature 
controller regulates the fuel flow to maintain the furnace 
temperature (measured by the wall thermocouples) at a set 
value. The fuel flow rate gradually drops and the waste 
gas temperature rises until the ingots are completely 
soaked, when both the f uel flow rate and the waste gas 
temperature tend to stabilize. 

While under-soaking of the ingots results in bad 
rolling, increased rejection and possible roll damage, over- 
soaking is wasteful from pit utilization and fuel optimiza- 
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optimization points of view. 

The problem 

The main objectives are 

(i) estimation of the ingot-centre and surface tempera- 
tures, and 

(ii) prediction of the ready-to-roll time. 

The estimation and the prediction become difficult because 
of the large number of uncontrollable and unmeasurable 
factors influencing the processes, some of which are as 
follows. 

(a) The ingot characteristics: for example, the time between 
the teeming, stripping and charging of the ingots, the 
atmospheric conditions, and the ingot size and composition 
may be widely varying. 

(b) The pit characteristics: f or example, the heating 
dynamics, the charge distribution, bottom buildup due to 
scale accumulation inside the pit, recuperator efficiency 
etc. may keep changing. 

Hence prediction of ready-to-roll time based on a heat 
transfer model alone will be difficult. Kalman filtering 
approach can be useful under such circumstances. 

The use of Kalman filter for estimation and prediction 
of the soaking pit process is discussed in many papers, for 
example Lumelsky (1983), Wick (1982). The present study is 
mainly based on the work of Lumelsky (1983). The ingot 
soaking process is expressed by a linear state-space model. 
The same model is used for the maximum-flow phase and the 
in-control phase of the soaking pit process. A heat-transfer 
model is used to generate data on the ingot-surface and the 
ingot-centre temperatures; these data together with the pit 
wall temperature measurement are used for the estimation of 
the parameters. The ingot-surface and the ingot-centre 
temperatures are estimated and predicted using the Kalman 
filter. 


6.8.2 Modelling, Estimation and Prediction 
Process model 

The ingot-soaking process is represented by the model 
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x(k+l) = Ax(k) + bu(k) + w(k), 
y(k) = c T x(k) + v(k), 
where 


x(k+l) = 


'xitk+l)" 
x 2 (k+l) , 
x 3 (k+l) 


x t , the ingot-surface temperature, 
x 2 , the ingot-centre temperature, 
x 3 , the pit wall temperature, 


( 6 . 8 . 1 ) 

( 6 . 8 . 2 ) 



a ll 

a 12 

a 13 


V 

A = 

a 21 

a 22 

a 23 

, b = 

b 2 


a 31 

a 32 

a 33. 


b 3. 


so the measurement, y = x 3 , the pit wall temperature, and 
the control variable u is the fuel flow, w and v are the 
process noise and the measurement noise respectively. 

From operational considerations (explained below), 
a 23 — a 32 - 0 and b 2 = 0. 


Remarks 

(1) The assumption, a 23 = a 32 = b 2 = 0, implies that the 
ingot-centre temperature is a function of the ingot-surface 
temperature. 

(2) The model (6.8.1 - 6.8.2) is a simplified model. 
Additional variables may be incorporated, if necessary. For 
example, the waste gas temperature can be used as an 
additional output measurement. 

(3) The pit wall temperature is usually the average of the 
two measurements obtained from the thermocouples located on 
the burner wall and the wall opposite, to minimize the 
effect of nonuniform heating inside the pit. 


Parameter estimation 

To estimate the parameters of the model (6.8.1), it is 
necessary to know the ingot-surface and the ingot-centre 
temperatures, which can be obtained from heat transfer 
models of the pit and the ingot; such models are discussed 
in Massey and Sheridan (1971), Hinami et al (1975), Kung et 
al (1967). Given the initial and operating conditions, e.g., 
ingot history, heating strategy etc., the temperature 
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profiles of the ingot-surface, ingot-centre and hence the 

eventual ready-to-roll time can be computed using the heat 

transfer models. Based on the ingot-surface and the 
ingot-centre temperatures obtained from the model, the 
parameters in (6.8.1) cam be determined using the least 
squares method. 

Initialization of the Kalman filter 

Once the parameters are known, the Kalman filter can be 
used to obtain the state estimates. The Kalman filter is 
initialized with assumed values for the states x lf x 2 , x 3 , 

the process noise and the measurement noise covariances, and 
the estimation error covariances. It is found that the 
eventual performance of the Kalman filter is not very 
sensitive to imprecise assumptions at initialization. 

The Kalman filter (6.6.9 - 6.6.13) is used for the 

state estimation (i.e. estimation of the ingot-surface and 
the ingot-centre temperatures), and the optimal predictor 
(6.7.4) is used for ready-to-roll-time prediction. Some of 
the filter and the predictor equations are restated here. 

Estimation of ingot temperatures 

The Kalman filter produces the state estimation and the 
one-step ahead prediction, given by 

x(k|k) = x(k|k-l) + k(k)(y(k) - c T x(k|k-l)), (6.8.3) 
x(k+l|k) = Ax(k|k) + bu(k). (6.8.4) 

k is the Kalman gain vector. Thus at any instant k, the 
estimates of the ingot-surf ace and the ingot-centre 
temperatures are obtained from (6.8.3). 

Ready-to-roll time prediction 

Here, two basic pieces of information are required: 

(i) the specification of the ready-to-roll time, and 

(ii) the future control input (i.e. the fuel flow) sequences 
or the model for the controller, which is required for the 
temperature prediction. 

The ready-to-roll time is often defined in terms of the 
temperature windows within which the ingot-centre 
temperature and the ingot-surface temperature must lie. 
These values may be typically 1280°±40°C and 1280°±20°C 
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respectively. The temperature ranges may vary depending on 
the type of steel and the rolling practice. 

The p-step prediction of the ingot temperatures can be 
obtained from the optimal state predictor (6.7.5), 

x(k+p|k) = A p x(k|k) + Y A m *bu(i), (6.8.5) 

i=k 

where m = k+p-1. Alternatively use 

x(k+pjk) = Ax(k+p-l|k) + bu(k+p-l). (6.8.6) 

The prediction equations (6.8.5) and (6.8.6) use the future 
values of u, which can be computed as follows. Often a PI 
(proportional + integral) controller is used for the pit 
temperature control; the future control input sequences can 
be obtained from 


l+i 


where 


u(i+l) = c p c( i+1) + c„ £ e(j), 

J=l+1-N 


e(i+l) = y set (i + D - y(i+l|i), 


(6.8.7) 


y(i+l| i) = C T x( i+1 j i ) . 

e is the deviation of the pit wall temperature from the 
desired set point y BOt ; the sequence (y set (k)> is assumed to 
be known. Ee is the integrated value of e over the 
appropriate (past) horizon N, c p and c q are the proportional 
and the integral constants of the PI controller. 

Thus starting from p=l, sequential multistep prediction 
of the state can be computed using (6.8.6-6.8.7) to deter- 
mine the predicted ready-to-roll time. 


Summary 

(1) Formulate the state-space model (6. 8. 1-6. 8.2) for the 
soaking pit and the ingot heating process. 

(2) Specify the initial condition, and use a suitable heat 
transfer model to generate the ingot-surface and the ingot- 
centre temperatures against the progression of time. 

(3) Use the ingot temperatures so obtained from the heat 
transfer model in the state-space description (6. 8.1), and 
estimate the parameters of the state-space model using the 
least squares method. 

(4) Define the ready-to-roll time in terms of the desired 
ingot temperatures. Define the parameters of the pit 
temperature controller. 
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(5) Run the sequential multistep prediction algorithm 
(6.8.6) to predict the ready-to-roll time. 

(6) From the state estimates (6.8.3), indicate when the 
ready-to-roll time is actually reached. 

Remarks 

(1) If the predicted ready-to-roll time is too long ahead 
of the mill availability, the heating strategy can be 
suitably modified to suit the needs of the mill. Again, 
prediction of the ready-to-roll time well in advance helps 
schedule the operation of the battery of pits. 

(2) The time when the pit will go into in-control phase, 
can be predicted the same way as the ready-to-roll time. 

(3) In spite of large variability in the ingot soaking 
process, the presented estimation and prediction scheme 
should improve the operational consistency as well as the 
fuel efficiency. The standard deviation of the average ingot 
temperature is also expected to decrease. 


6.9 CONCLUDING REMARKS 

In this chapter, state-space modelling and optimal 
estimation of states using the Kalman filter were studied. 
Compared with the transfer function models based on 
input-output data, the state-space approach offers the 
additional flexibility of accommodating internal variables 
as states, which may not be accessible or measurable. The 
state-space approach permits use of a large number of widely 
studied and well-established methods and algorithms for 
estimation, prediction, smoothing and control. 

The Kalman filter can produce optimal state estimates 
under steady state conditions, even when the measurements 
are noisy, conditional on the noise being independent with 
Gaussian distribution. In practical applications, there are 
two basic problems: (i) the parameters of the state-space 
model may be unknown or slowly time-varying, and (ii) the 
noise may not be truly independent and Gaussian. So the 
optimality of the estimates will suffer. Nevertheless, the 
Kalman filter can produce workably good results and hence is 
widely used for real-life applications. 

The Kalman filter itself produces one-step ahead 
prediction; multistep predictions are obtained through 
repeated use of shorter-step predictions. 
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CHAPTER 7 


ORTHOGONAL TRANSFORMATION AND MODELLING OF 
PERIODIC SERIES 


Two basic consequences of orthogonal transformation 
are relative decorrelation of data and compression of 
information, which can be used for modelling and 
prediction of periodic series. 


7.1 INTRODUCTION 

The theory and application of orthogonal transf ormation is 
discussed in this chapter; the study concentrates on the 
singular value decomposition and the Walsh-Hadamard 
transf ormation. 

Orthogonal transf ormation converts mutually correlated 
set of data into a relatively decorrelated set of transform 
coefficients (or spectral components). The energy in the 
data, which represents the information content, remains 
conserved in the transformation process but the distribution 
of energy becomes more compact following transformation 
compared with the original time domain, where the energy 
distribution is rather uniform. The transformation process 
is linear and reversible. 

Two closely related f ormats of orthogonal transf or- 
mation studied in this chapter are as follows: 

(i) The transform y of the data vector x is obtained by 
multiplying x by the orthogonal matrix W: 

y - Wx. 

Karhunen-Lofeve transf ormation, Walsh-Hadamand transf orma- 
tion etc. are based on this format. 

(ii) The data or information matrix is decomposed into 
orthogonad factors. Singular Value Decomposition (SVD) has 
such a format. For example, SVD of a matrix X is given by 

X = USV T , 

where U and V are orthogonal matrices, and S is a diagonal 
matrix of the singular values X. 
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Orthogonal transf ormation results in relative decorre- 
lation of the data and compaction of inf ormation. Hence it 
is advantageous to perf orm modelling and prediction 
operations with the transf ormed data; the time domain 
results can be obtained through reverse transformation. 

Orthogonal transformation offers some of the strongest 
tools for data analysis, modelling, prediction and filte- 
ring. It is applicable to periodic or aperiodic processes, 
linear or nonlinear processes, simple single-input single- 
output or complex multi-input multi-output processes. This 
chapter concerns the modelling of data series which are 
nearly periodic in nature; applications to quasiperiodic 
series and complex multivariables processes are studied in 
Chapters 10 and 11. 

This chapter opens with an introduction to the basics 
of orthogonal transf ormation in Sec.7.2, where the charac- 
teristic f eatures and the properties are discussed. Next the 
theory of Karhunen-Lofrve transformation (KLT) is studied in 
Sec.7.3. KLT is an eigenvalue based optimal orthogonal 
transformation; although the application of KLT is not 
included in the modelling and prediction methods discussed 
in this chapter, it is expected that a study of KLT can help 
the understanding of orthogonal transf ormation. 

The two types of transformations studied in this 

chapter are the Walsh Hadamard Transformation (WHT) and the 
Singular Value Decomposition (SVD). WHT is extremely simple 
in implementation, whereas SVD is one of the most 
numerically robust algebraic operations. SVD features widely 
in this book both as a tool for matrix operations as well as 
an integral part of algorithms designed f or the analysis of 
data and modelling of processes. Both WHT and SVD are 
noncircular methods of orthogonal transformation; among the 
popular circular methods are Fast Fourier transform (FFT) 
(discussed in Chapter 2), and Discrete cosine transform 
(DCT). 

The theoretical aspects of WHT are presented in 
Sec. 7. 4, and the applications of WHT to some real-life 
prediction problems are discussed in Sec. 7.5. The theory of 
Singular Value Decomposition is studied along with its 

characteristic features in Sec.7.6. In Sec.7.7, the charac- 
terization of a periodic process in terms of the 

decomposition components of SVD is presented, which is 
central to the applications of SVD f or the modelling and 
prediction of periodic time series discussed in Sec.7.8. 

This chapter is supported mainly by six appendices; 
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Appendix 7A presents details of the solutions of some of the 
examples f eaturing in this chapter. Appendix 7B to 7F 
contain some real-life time series data used in this book. 


7.2 BASICS OF ORTHOGONAL TRANSFORMATION 


Orthogonality and orthonormality 


Normal functions: A function is said to be normal if its 
inner (or scalar) product with itself is unity. 


Orthogonal functions or signals: Two signals (y(k)> and 
(z(k)> are said to be orthogonal over an interval k = 1 to m, 
if 


E 


k=l 


y(k)z(k) 


( 0 for m # n, 

[ C for m = n, C being a constant. 


Orthogonal vectors: Two vectors x and y are orthogonal if 
their scalar product is zero, i.e. xy = 0. The n-vectors 
v lt v 2 ,..., v r are mutually orthogonal, if they are pairwise 
orthogonal. 


Orthonormal vectors : The n-vectors Vj, v 2 ,..., v r are ortho- 
normal, if 

(a) they are pairwise orthogonal: 

ViVj = 0, i * j for i, j * 1,..., r, and 

(b) they are individually normalized: 

T . 

ViVj =1 i = 1,..., r. 

Such a set of vectors is called an orthonormal set. 


Orthogonal matrix 

A square matrix V is orthogonal, if and only if 

V T V = VV T = I. (7.2.1) 


It is implied that both the row vectors and the column 
vectors of the square matrix V: 



f v ll 

Viz 

- v ln l 

V = 

V ?1 

v 22 



V„1 

V„2 

••• V 


form normal orthogonal systems, that is 
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vjvj = t v lk vjj, k,j = 1,2 ,n , 

1=1 

_ * * ( = 1, if k=j, 

k ->’ k J [ = 0, otherwise. 

An orthogonal matrix has a determinant of +1 or -1 but the 
converse is not true. 

Orthogonal transf ormation 
For any vector x, the linear transf ormation 
y = Ax 

is an orthogonal transf ormation, if the absolute value (or 
the Euclidean length) of x is preserved through the 
transf ormation, that is 

|y| = | Ax | - |x|, 

where the transformation matrix A is an orthogonal matrix: 

T T 

A A = A A = I, 

whereas multiplication by a general nonsingular matrix may 
alter the absolute magnitude of a vector drastically. In 
other words, the 2-norm of a vector x remains unchanged 
under orthogonal transf ormation, 

I Ax 1 2 = x T A T Ax, 

= x T x = \x\l 

Similarly, for any orthogonal matrices A and B of 
appropriate dimensions, the matrix 2-norm and the Frobenius 
norm of any mxn matrix Y remain invariant under orthogonal 
transf ormation: 

|aybi 2 - (y| 2 , 

|AYB|| r = |Y| f . 

General properties 

(a) Uniqueness: Orthogonal signals or vectors carry dis- 
criminating information and are maximally independent. 

(b) Energy conservation and compaction : The total energy in 
the transformed data is the same as in the time or spatial 
domain (that is as before transformation), however while the 
energy is usually unif ormly distributed in the time or 
spatial domain, the orthogonal tranf ormation results in 



204 Chapter 7 Orthogonal Transformation and Modelling 


concentration of energy to fewer points in the transformed 
data. This feature of compaction of energy can be used for 
data compression, since the relatively insignificant data 
elements can be discarded without any conceivable loss of 
information content. 

(c) Noise immunity: Operations with the transformed data 
enjoy excellent noise immunity. Perturbations in the 
original data are carried usually at much attenuated level 
in the transformed data; in any case the perturbations 
cannot be larger than that in the original data. 

(d) Linearity and reversibility: Orthogonal transformation 
is a linear and reversible transf ormation but all linear or 
reversible transformations are not orthogonal. 

(e) Physical interpretation: The physical sense in the data 
is apparently destroyed by orthogonal transformation but the 
transf ormed data are of greater computational and 
statistical significance. No information is lost through the 
transformation, as the transformation is reversible, and the 
original data can be reconstructed by reverse trans- 
f ormation. 

(f) Causality: As against a transfer function model or time 
series model, which has a causal structure, no causality is 
presumed in orthogonal transform representation. 


7.3 KARHUNEN-LO&VE TRANSFORM 

Karhunen-Lofrve expansion or transformation (KLT) can produce 
optimal representation of the information in the data, in 
reduced dimension; the optimality is in minimising the mean 
square transformation error. The basic idea is to map the 
data vector into the transform components through orthogonal 
transformation; the optimal choice of these components will 
lead to the optimal transformation, which on reverse 
transformation should produce minimum mean square error 
reconstruction of the original data. 

KLT lacks popularity because of its high computational 
requirement. However, the study of KLT is useful in 
understanding the optimal transformation aspect, which is 
unique in orthogonal transformation. 

The problem 

Let x, the observed data vector, be mapped onto y through 
the transformation: 
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x = Wy = £ y^, (7.3.1) 

1=1 

where 

T T 

X = [x t , x 2 x n ] , y = [yj, y 2 , --, y n l . 

and W is an nxn orthogonal matrix: 

W = [wj, w 2 w n ], W T W = WW T = I. 

Hence 

yi = Wjx. 

Each component y l in the transform domain represents a 
discriminating feature of x. Wj which represent n- 
dimensional transform space are called the basis vectors. 

If y m , a subset of y or a reconfigured .y, is used to 
represent x, then the transf ormation will be optimal if the 
mean square error |x-Wy B || is minimized. 

The conditions under which the transf ormation is 
optimal are discussed. 

Optimal transformation 
Consider the transform vector 

y* = lyi. y 2 y». *w b B+2> ..., b n j T , 

where b B+1 , b B+2 ,..., b n are preselected constants. Hence 

the estimated data vector is 

x = Wy m . 

The estimation error is given by 

x = x - x = £ (yi-b^Wj; (7.3.2) 

1 =m+l 

the mean square error is given by 

J = E|x|| 2 . (7.3.3) 

J is minimized for the optimum values of b t and Wj. It can 
be shown that following (7.3.2-7.3.3) the value of b 4 that 
minimizes J is given by 

bj = w|x, (7.3.4) 

where x = E(x). For this value of b t following (7.3.4, 
7.3.3 and 7.3.2) J becomes 

n »j« 

J = E w,P x Wi, 

1 =m+ 1 
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where 

P x = £{(x - x)(x - x) T > 

is the covariance matrix of x. The optimum value of w t is 
given by 

P x Wi = k!W 1( 

which by definition implies that k t are the eigenvalues A t , 
and Wj are the corresponding eigenvectors of the covariance 
matrix P x . Hence the minimum mean square error is given by 

- t *i. (7.3.5) 

l=m+l 

The conditions f or the optimal transf orm are derived in 
Ahmed and Rao (1975, p.200). 

The expansion of the data vector x in terms of the 
eigenvectors of its covariance matrix: x = Wy, as in 
(7.3.1)^ is referred to as the Karhunen-Lo6ve expansion, and 
y = Wx is called the Karhunen-Lofeve transform. 

Remark: In statistics. Principal Component Analysis (PCA) 
refers to the optimal linear transformation y = Wx. Again, 
Hotelling transform in Communication theory has the same 
structure. 

Characteristic features 

T 

(a) Uncorrelated features: Since y = W x, the covariance of 
y is diagonal: 

P y = W T P X W 

— Diag. [Aj, A 2 , ...» A n ], 

Hence the feature elements y s are mutually uncorrelated. If 
{x t > have normal distribution, the features (y t > will be 
mutually independent. 

(b) Optimal feature selection and ordering: The relative 
richness of information contained in the transform elements 
y t depends on the relative magnitude of the corresponding 
eigenvalues A t . If the eigenvalues are arranged in order of 
nonincreasing magnitude: 

X x > A 2 > A 3 > ... > A n > 0, 

then the ordering of the features y t , y 2 , y 3 , ... will 
conform to a descending degree of importance. This reordered 
format is called the Generalized Karhunen-Loeve transform. 
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If n-dimensional x is now represented by r significant 
transformed elements as 

y' = y r l T , r«n, 

n 


where A r+1 ,..., A n are the lowest (n-r) eigenvalues and 
hence y r will provide minimum mean square error f eature 
measurement of y. Thus KLT offers a procedure for feature 
selection and ordering which is statistically optimal. 

(c) Computational aspects: Although KLT is an optimal 
transformation, its main drawback is the high computational 
requirement which is of the order of 2n multiplications. 
There being no fast algorithms available for KLT, often 
suboptimal transforms, e.g., Fast Fourier Transform, Walsh 
Transform, Discrete Cosine Transform etc. are used for which 
fast algorithms are available. 


7.4 WALSH-HADAMARD TRANSFORM 

The Walsh-Hadamard transform is one of the most widely used 
nonsinusoidal, suboptimal orthogonal transforms. While 
Fourier transform decomposes signals or data sequences into 
sinusoidal components, Walsh transform decomposes the same 
into rectangular pulse sequences called Walsh functions. As 
proposed by J.L. Walsh in 1923, the Walsh functions form a 
complete set of orthogonal rectangular pulses of magnitude 
+1 or -1. 

Fig.7.4.1 shows a set of eight Walsh functions, where 
the following structural characteristics may be noted: 

(a) A Walsh function is completely specified by the time 
period t and an ordering index i: 

<wal(i,t), i = 0, 1,..., n-l>, 

where n = 2 P , p being a positive integer; n number of 
functions constitute a particular set. 

(b) The time period is divided into n equal subintervals. 
The f unction changes sign only where t is a multiple of a 
power of 1/2. 

(c) The functions are arranged in increasing order of ■ 
zero-crossings. The number of zero-crossings within a unit 
time period is called sequency. 
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wal(0,t) 


wal( 1 ,t) 



wal(2,t) 



wal(3,t) 



wal(4,t) 



wal(5,t) 



wal(6,t) 

=q — 

wal(7,t) 

i 


1 1 1 h 


0 1/4 1/2 3/4 1 t 


Figure 7.4.1 Walsh ordered continuous-time Walsh 
functions, n = 8. 


H n ” 


'11111111' 
1 1 1 1 -1 -1 -1 -1 

l l -l -1 -I -i T l 

l l-i-i l l-i-i 

l -l -l i l -i -1 l 

l-i-i 1-1 l l-i 

l -i l -l -i' i -i l 

l-ii-i 1-1 1-1 


Sequency 
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4 

5 

6 
7 


Figure 7.4.2 Matrix representation of Walsh ordered 
discrete-time Walsh functions, n = 8. 


If the continuous-time Walsh functions in an unit interval, 
as shown in Fig.7.4.1, are sampled at n equidistant 
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sub-intervals, an equivalent discrete-time representation in 
terms of an nxn matrix results (Fig.7.4.2). This matrix H n , 
is a subset of a class of matrices called Hadamard matrices. 
Because of the representation of the Walsh f unctions in 
terms of Hadamard matrices, the corresponding orthogonal 
transformation is also called Walsh-Hadamard transf ormation 
(WHT). The sampled sets of Walsh functions, given by the 
rows of H n , are called the basts functions. 

The discrete Walsh functions are 

(a) mutually orthogonal: 

£ wal(i,t)wal(j,t) = f £ ! * f 
t=o l n » for 1 J. 

and 

(b) symmetric: wal(i,t) = wal(t,i). 


7.4.1 Generation of Walsh Functions using 
Hadamard Matrices 

There are various procedures for the generation of the Walsh 
functions by software and hardware means (Beauchamp, 1984, 
Alexandridis, 1989). One of the most efficient methods of 
performing WHT is by using Hadamard matrices. 

The Hadamard matrix H, originally introduced by French 
mathematician M.J. Hadamard in 1893, is an nxn square matrix 
of +1 and -1 elements with the following features: 

T 

(a) H H = nl, I being the identity matrix 

(b) n = 2 P , p being a positive integer 

(c) H can be reconfigured such that all elements of the 1st 
row and the 1st column are positive, and both sum up 
to n. 

The lowest order Hadamard matrix is of order 2: 



The higher order matrices can be generated recursively as 

H n = [h 172 oH. (741) 

L M n/2 M n/2j 

where n = 2 P , n/2 = 2 p l , and p = 2,3,4 etc. For example, H 8 
can be expressed as shown in Fig. 7. 4. 3. The recursive 
relation given by (7.4.1) may also be expressed as 
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H„ = ® H 2 , (7.4.2) 

where ® stands for direct or Kronecker product which implies 
substitution of each element of H ^/2 by H 2 in (7.4.1). 

Note that the rows of the Hadamard matrix Fig.7.4.3, 
generated by the recursive relationship (7.4.1), represent 
the same eight sampled set of Walsh functions as in 
Fig.7.4.2; however the rows in Fig.7.4.3 are said to be 
natural ordered or Kronecker ordered unlike those of 
Fig.7.4.2 which are sequency ordered. 



|H 2 -H 2 . -H 2 H;J 


Sequency 
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1 

l 

l 

l 

l 

l 

r 
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-1 

l 

-l 
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" i 

-l 

-l 

l 

....... 

-l 
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-l 

-l 

l 
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-l 

-l 

i 

4 

1 

l 

l 

l 

-l 

-l 

-l 

-i 

1 

1 

-l 

l 

-l 

-l 

i 

-l 

i 

6 

1 

l 

-l 

-l 

-l 

-l 

l 

i 
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1 

-l 

-l 

l 

-l 

i 

i 

-i 
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Figure 7.4.3 Natural or Kronecker ordered nxn 
Hadamard matrix H n , n = 8. 


There are fast algorithms available both f or sequency 
ordered as well as natural ordered Walsh Hadamard 
transforms. The total number of additions and subtractions 
required for implementation is nlog 2 n. 

A summary of the basic properties of Hadamard matrices used 
in WHT follows. 

(a) The Hadamard matrix H n is an nxn square symmetric 
matrix of +1 and -1 elements, where n = 2 P , p being a 
positive integer. 

(b) HX = nl n and H^ 1 - ±H n . (7.4.3) 

(c) All rows (as well as columns) of add up to an n 

vector: [n 0 0 ... 0]. 

(d) A matrix H, generated by Kronecker product of two 
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Hadamard matrix H. and Hn, is a symmetric matrix of 
order mn: 

H = H* ® H n 

Hmn mnl mn . 


7.4.2 One-Dimensional WHT 

The Walsh-Hadamard transform of an n-data vector x is given 
by 

x„ = “H n x, n = 2 P , p = 1,2,3 (7.4.4) 

where H n is nxn Hadamard matrix, and x w is the transformed 
vector. Note that if the data set is of some arbitrary 
length it has to be either truncated or extended with 
dummy data, such that it is n-long. 

Using (7.4.3), the reverse transform from Walsh domain 
to time or spatial domain is obtained as 

x = H n x w . (7.4.5) 

Example 

If x = [2 -1 3 5] T , n = 4, 



'1111 


' 2 

1 

1-1 1-1 


-1 

X* 4 

1 1 - 1-1 


3 


1 - 1-1 1 


5 


X w = [9/4 1/4 -7/4 5/4] T . 


Characteristic features 

WHT of some typical data patterns are shown in Fig.7.4.4; 
the following features may be noted. 

(a) x w (l) gives the average value of the signal. It is the 
same as the zero-frequency term of the trigonometric Fourier 
series. x M (l) = 0 implies the data vector being zero mean. 

(b) x w ( 2 ) is indicative of asymmetry about the centre; its 
small magnitude will mean the signal will be relatively 
symmetric about the centre line. 

(c) For a strictly symmetric signal of length 2 P , p = 1, 2, 
etc., the number of prime components will be p with the rest 
2 p -p terms being zero. 

(d) The energy in x H is n times that of x, where n = 2 P . 
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Data points WHT components 



X * 1-6 -6 -4 -1 1 4 6 6| T x„ » [0 -0.75 -1.75 0 -4.25 0 0 0.75] T 


Figure 7.4.4 One-dimensional Walsh-Hadamard transform 
of some typical data patterns. 
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Remark: Implementation of WHT 

Since WHT involves only additions, subtractions and division 
by 2 P , the computational load is minimal; WHT can also be 
easily hardware implemented. 

Example 7.4.2 WHT of the atmospheric C0 2 series 

Consider the monthly data for 10 years for the atmospheric 
C0 2 variations given in Appendix 1C. The data for each year 
are arranged into the rows of a matrix A; each row of A is 
appended with 4 zero elements to make the row length equal 
to 2 . A and its WHT matrix A„ are as follows. 

A = 


3 1 S . 16 

315.97 

316.37 

317.40 

317.96 

317.82 

316.23 

314.54 

313.60 

313.03 

314.57 

315.32 

0 

0 

0 

0 

316.10 

316.68 

317.37 

318.79 

319.63 

319 29 

317.86 

315.55 

313.85 

313.64 

314.61 

315.81 

0 

0 

0 

0 

316.54 

317.34 

318.12 

319.06 

320.20 

319.44 

318.24 

316.52 

314.57 

315.13 

315.75 

316.73 

0 

0 

0 

0 

317.70 

318 29 

319.37 

320.25 

320.84 

320.43 

319.35 

317.13 

316.01 

315.19 

316.42 

317.47 

0 

0 

0 

0 

318.45 

318.82 

319.72 

321.06 

321.87 

321 22 

319.44 

317.48 

315.89 

315.83 

316.72 

317.98 

0 

0 

0 

0 

319.32 

320.36 

320.82 

322.06 

322.17 

321.95 

321.20 

318.81 

317.82 

317.37 

318.93 

319.09 

0 

0 

0 

0 

319.94 

320.98 

321.81 

323.03 

323.36 

323.11 

321.65 

319.64 

317.86 

317.25 

319.06 

320.26 

0 

0 

0 

0 

321.65 

321.81 

322.36 

323.67 

324.17 

323.39 

321.93 

320.29 

318.58 

318 60 

319.98 

321.25 

0 

0 

0 

0 

321.88 

322.47 

323.17 

324.23 

324.88 

324.75 

323.47 

321.34 

319.56 

319.45 

320.45 

321.92 

0 

0 

0 

0 

323.40 

324.21 

325.33 

326.31 

327.01 

326.24 

325.37 

323.12 

321.85 

321.31 

322.31 

323.72 

0 

0 

0 

0 



’ 3788.0 

-02 

- 0.9 

-0 0 

1254.9 

- 3.8 - 10.9 

3. 1 

1274.9 

0.2 

5.6 - 2.7 - 1258.2 

- 3.5 

- 4.4 

0 . 4 ' 

3799.2 

- 0.3 

- 0.8 

0.3 

1254.5 

- 5.6 -11 8 

4.2 

1283.4 

1.6 

5.1 - 2.5 - 1261.3 

- 3.7 

- 6.0 

1.4 

3807.6 

- 0.8 

- 1.2 

- 0.4 

1258.8 

- 5.8 - II . 0 

1.5 

1283.3 

2.3 

4.4 - 1.2 -1265 5 

- 2.7 

- 5.4 

0.7 

3818.5 

0.9 

- 1.5 

0 3 

1263 0 

- 4.3 - 11.1 

4.0 

1288.3 

1.4 

3.9 - 3.4 - 1267.2 

- 3.9 

- 5.7 

0.2 

3824.5 

- 0.3 

-0 3 

1.0 

1264.5 

- 5.5 - 12.7 

3.6 

1291.6 

2.1 

5.6 - 1.7 - 1268.4 

- 3.1 

- 6.7 

1.0 

3839.9 

0.6 

- 1.9 

- 1.4 

1271.6 

- 4.6 - 10. 1 

3.0 

1293.5 

0.0 

3.7 - 2.6 - 1274.8 

- 5.2 

- 4.5 

1.8 

3848.0 

- 0.6 

- 2.9 

0.2 

1272.4 

- 5.1 - 13.3 

3.7 

1299.1 

0.6 

5.5 - 3.4 - 1276.4 

- 3.9 

- 4.9 

0.1 

3857.7 

- 0.3 

- 1.3 

1.5 

1278. 1 

- 5.2 - 12.0 

3.3 

1300.9 

2.2 

6.8 - 1.0 - 1278.7 

- 2.6 

- 3.9 

0.8 

3867.6 

- 0.8 

- 1.6 

0.0 

1278.7 

- 5.3 - 11.2 

4. 1 

1304.8 

2.0 

5.1 - 3.1 -1284 1 

- 2.6 

- 4.5 

0.9 

3890.2 

0 4 

- 2.1 

0.6 

1286.7 

- 5.7 -11 7 

3.6 

1311.8 

2 . 1 

3.6 - 3.3 - 1291.7 

- 3.9 

- 5.9 - 0 . 3 J 


Note that only some of the columns of A M (that is columns 1, 
5, 9, and 13) carry the major part of the information. The 
sum of squares of the elements of individual columns (1 to 
16) are found to be as follows: 


574268.202, 

0.013, 

0.103, 

0.024, 

62841.724, 

1.028, 

5.264, 

0.474, 

65326.371, 

0.109, 

0.987, 

0.268, 

63268.571, 

0.502, 

1.078, 

0.032. 
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The implication is that A can be reconstructed with little 
loss of information using only 4 columns of A w . This feature 
is used in the prediction of periodic series as discussed in 
Sec. 7. 5. 


7.4.3 Two-Dimensional WHT 


Often the data set arranged in the form of a matrix may have 
correlation both in the direction of rows and columns which 
justifies two-dimensional Walsh-Hadamard transformation: 


X„ = 


— — H n XH_ , 

n l n 2 1 


where X is n^^ data matrix, X w is its transform, and 
nj = 2 Pl , n 2 = 2 Pz , pj and p 2 being positive integers. The 
reverse transf ormation is given by 


X = H n X W H_ . 

n l w n 2 

If nj = n 2 , 


X w = (l/n Z )H n XH n and X = H n X w H n . 

Since H” 1 = ^ H n , X w may also be expressed as 

nX w = K l XH n , 

which is in similarity transformation format, [nX w ] and X 
being similar matrices. 

Remark : Similarity transf ormation 

Two square matrices X and Y are similar, if there is a 
nonsingular matrix G such that Y = GxG; G is called the 
similarity transformation matrix. If X and Y are similar, 
their eigenvalues, characteristic polynomials and determi- 
nants are equal. If v is the eigenvector of X, G v will be 
the eignvector of Y. Y may not be a diagonal matrix. A 
square matrix X is called defective, if it is not diagonable 
by similarity transformation. Qa 

The two-dimensional WHT is applicable for data compression 
where the information is contained in two-dimensional space; 
such applications f or image processing are discussed in 
Alexandridis (1989). 
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7.5 PREDICTION BASED ON WHT 
7.5.1 Basic Principle 


The basic idea is to extrapolate the data sequence following 
transformation, and to perform reverse transformation to 
produce prediction. 

Let X be the mxn matrix containing process data, where 
the process has n discrete points in each period and the 
data for m consecutive periods are arranged in the 
consecutive rows. The objective is to predict the (m+l)th 
row of X. 

The prediction is performed as follows: 


( 1 ) 


Form A by appending required number of zeros to each 
row of X such that the row length is 2 P , p being a 


positive integer. 


(2) Compute WHT of A, given by 


A* = ±H n A T , n = 2 P . 

n “ 

(3) Determine the columns of A w which are relatively 

dominant. Model the series of successive elements of 

each of these columns of A^ as an autoregressive 

process; estimate the parameter values, and produce 
predicted values of the series using the estimated 

parameters. For the nondominant columns of A M , the m-th 
column element may be assumed to be the predicted value 
of the (m+l)th element. Thus the predicted (m+l)th row 
of A w is produced. 

(4) On reverse WHT, the (m+l)th extrapolated row of A w 
will represent the predicted (m+l)th period (where the 
first n data elements are considered). 


7.5.2 Prediction of Power Load on a Substation 

The electrical power load on a substation principally shows 
two periodicities: the daily periodicity and the weekly 
periodicity. A further periodicity related to the yearly 
variations is often present. Usually the daily periodicity 
is most important, which is addressed in this example. The 
objective is to model the 24-hourly electrical load vari- 
ations of a substation on Mondays and to produce one week 
ahead prediction of the 24-hourly power demand. Hourly power 
load data f or 10 consecutive Mondays are used f or this 
study, and the load for the 11th Monday is predicted. 
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Electrical 

load 

(MWH) 



No. of Mondays 


Figure 7.5.1 Reconstruction and prediction of eletri- 
cal power load data for 9th to 11th Mondays using WHT; 
the first two are reconstructed periods, and the last 
period is the one week ahead prediction. 


The available hourly data for the consecutive Mondays 
are sequentially arranged into the rows of a matrix A and 
each row is extended from 24 to 32 with appended zeros. The 
10x32 matrix A is WH-transf ormed to A w . It is found that 11 
columns of A w are relatively dominant. The successive ele- 
ments of each of these columns are treated as separate time 
series <x(k)>, and are modelled as 

x(k) ■ f 0 + fiXik-l) + f 2 x(k-2) + e(k), (7.5.1) 

where e(k) is the noise term; the parameters are estimated 
using the least squares estimator; the parameter values and 
other details are given in Appendix 7A.1. For each sequence, 
one-step ahead prediction is obtained as 

x(k+l | k) = f o + fix(k) + f 2 x(k-l). (7.5.2) 

The predicted 11th row of A„ is reverse transformed, the 
first 24 points of which are the predicted values of the 
power load of the following Monday. 

The results are shown in Fig.7.5.1; the data for the 
9th and the 10th years are reconstructed using 11 dominant 
columns of A H and the prediction of the 11th year is 
produced as detailed above. The MSE for prediction works out 
to be 114.051. 
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Remarks 

(i) WHT can be applied for the periodic prediction of 

nearly repetitive series. If the periodic pattern or the 

magnitude over the periods vary, better predictions may be 
produced using SVD based methods (see Secs. 7.8 and 11.2). 

(ii) The main advantage of WHT is the computational 

simplicity and the hardware implementability, although the 

degree of compaction of information through WHT is not as 
much as in SVD. 


7.6 SINGULAR VALUE DECOMPOSITION (SVD) 

Singular value decomposition is an optimal orthogonal 
decomposition which finds wide applications in rank deter- 
mination and inversion of matrices, as well as in the 
modelling, prediction, filtering and information compre- 
ssion of data sequences. SVD is closely related to 
Karhunen-Lofrve transformation, singular values being 
uniquely related to eigenvalues, although the computational 
requirements of SVD are less than KLT. From a numerical 
point of view, SVD is extremely robust, and the singular 
values in SVD can be computed with greater computational 
accuracy than eigenvalues. 

SVD is popularly used for the solution of least squares 
problems; it offers an unambiguous way of handling rank 
deficient or nearly rank deficient least squares problems. 
SVD is also the most definitive method for the detection of 
the rank of a matrix or the nearness of a matrix to loss of 
rank. 

In this book, SVD has been widely used for modelling 
and prediction as well as for algebraic matrix operations. 


7.6.1 Introduction to Singular Value Decomposition 

Given any mxn real matrix A, there exist an mxm real ortho- 
gonal matrix U, an nxn real orthogonal matrix V and an mxn 
diagonal matrix S, such that 

A = USV T , S = U T AV, (7.6.1) 

where the elements of S cam be arranged in nonincreasing 
order, that is 
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(i) for a nonsingular A, 

S = diag isj, s 2 ,..., s p >, p = min (m,n), 

s t a s 2 s S3 ... a s p > 0, or 

(ii) for A of rank r, 

s i a s 2 a ... a s r > 0 and s r+1 = s r+2 =...= s p = 0. 
In other words, U T U = UU T = I, V T V = VV T = I, and 


S = 


si 

0 


s 2 



for m>n=p. 


The decomposition (7.6.1) is called the singular value 
decomposition. For proof of this theorem see Golub and Van 
Loan (1989, p.71). The numbers s 1( s 2 , s 3 ,..., s p are the 
singular values (or principal values) of A. U and V are 
called the left and right singular vector matrices of A 
respectively. U and V can be expressed as 

U ■ [u t u 2 ...u t ...uj, and 

V = v 2 ...vj ...v n ], 

where for i=l to p, the m-column vector u t and the n-column 
vector v lf which correspond to the i-th singular value s,, 
are called the i-th left singular vector and the i-th right 
singular vector respectively. 

Again 

aa t = usv T vsu T , 

» UsV. (7.6.2) 

Hence, t;he columns of U are the m orthonormal eigenvectors 
of AA ; the diagonal matrix S 2 is the eigenvalue matrix of 

AA having the (positive) eigenvalues, s*, s 2 Sp, on 

the diagonal. Similarly 

A T A = VS 2 V T . (7.6.3) 

So the columns of V are the orthonormal eigenvectors of 
A t A, and the eigenvalues of A T A are the same as those of 
AA. Thus the singular values of A: Sj, s 2 ,„., s p , are the 
positive square roots of the eigenvalues of AA T or of A r A. 

The i-th left and right singular vectors, u t and v t 
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respectively, corresponding to the i-th singular value s t 
are unique. 

Remark: Eigenvalue-eigenvector decomposition 
With reference to (7.6.2) and (7.6.3), for any symmetric 
matrix X, if X = QDQ, where Q is orthogonal and D is a real 
diagonal matrix, then the diagonal elements of D are the 
eigenvalues of X, and the column vectors of Q are the 
eigenvectors of X. The i-th eigenvector q t is associated 
with the i-th eigenvalue d t , satisfying the relationship, 
Xq t = d^. The decomposition X = QDQ is called the 

spectral decomposition of X. 


7.6.2 Characteristic Features of SVD 
Basic features 

(i) The number of nonzero singular values of a matrix A is 
the rank of A. 

(ii) Both the 2-norm and the Frobenius norm can be defined 
in terms of the singular values: 

|| A || 2 = s lf the largest singular value of A; (7.6.4) 

||A|| F = (s Z + si +...+ Sp) 1/Z , p = min (m,n). (7.6.5a) 

(iii) From energy considerations, 

E E(*ij) 2 = |A|J = si * si +...+ s Z (7.6.5b) 

1=1 J=i 

a t j being the jth element of the i-th row of A, and since 

A = USV T = £ u lSl v[, 

1=1 

(uj and Vj being the columns of U and V respectively), 
the energy in the decomposed component matrix 
UiSivj = s z . 

Rank characterization 

The number of nonzero singular values indicate the effective 
rank of a matrix. The singular values also indicate 
precisely how close a given matrix is to a matrix of lower 
rank which can be explained as follows. 

For an mxn full-rank matrix A, 

U T AV = diag (sl s 2 ,...,s p ), 


p = min (m,n). 
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Let A r be a close lower (say r) rank matrix: r<p. So 

T 

U A r V = diag (sj, s 2 s r , 0,..., 0). 

Hence 

U [A-A r ]V = diag (0 0, s r+1 s p ). 

Since the matrix 2-norm is equal to the highest singular 
value, following (7.6.4), 

|j A— A r || 2 * s r+1 . 

Hence the smallest singular value of A is the 2-norm 
distance of A to the set of all rank deficient matrices. 

Effects of changes of matrix size 

The ratio of the largest singular value of A to the smallest 
nonzero singular value of A is called the condition number. 

If a column is added to the mxn matrix A where min 
the largest singular value increases and the smallest 
singular value decreases, and hence the condition number 
increases; the result of deletion of a column is the 
opposite. If m<n, with addition of a column the largest 
singular value increases but the smallest singular value 
does not decrease. 

Stability against perturbations 

Singular values of a matrix show high degree of stability. 
Perturbations in the elements of a matrix can cause 
perturbations of equal or smaller magnitudes in the singular 
values as follows. 

If tjie mxn matrix A (m s n) is perturbed by A resul- 
ting in A , 

* ~ 

A = A + A, then 
I S i-Si I =s Sj = I Aj, 

* ~ * 

where Sj , Sj and Sj are the singular values of A , A 

and ~ A respectively, and Sj is the largest singular value 
of A. The limit on the perturbation in the singular values 
can also be expressed by the Wielandt-Hoffman theorem: 
n $ o n o n n 9 

= i E iJ E i a» - I*If. 

a t j being the jth element of the i-th row of A. 
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Numerical robustness and implementation 

Singular value decomposition is extremely robust numerically 
compared with eigenvalue computations. Computer programs for 
SVD are available in many high quality software packages, 
e.g., UNPACK, EISPACK and MATLAB. The coding in FORTRAN is 
also given in Lawson and Hanson (1974). 


Example 7.6.2 Perform the singular value decomposition of 
the matrix A given by 


A = 


'3 

4 

7 


9 

2 

3 


SVD of A produces 

T [0.6420 -0.7667] 
[0.7667 0.6420j ’ 

and the singular values: s t = 11.8695, s 2 = 5.2071. Note 
that conforming to (7.6.5), the energy in A is 168. 


U = 


0.7436 0.6678 0.0324 
0.3455 -0.3424 -0.8737 
0.5724 -0.6609 0.4854 


Remark: One important feature of SVD is the ability to 
characterize the periodicity present in the data, which is 
discussed next. The use of SVD for determination of the 
periodicity in the data is treated in Sec. 11. 4. 


7.7 CHARACTERIZATION OF PERIODIC PROCESSES 
USING SVD 

SVD offers one of the most robust approaches to the 
analysis, modelling and prediction of data series with 
periodic excursions. In this section, the characterization 
of the periodic series in terms of the decomposition 
components U, S, and V is considered. 

Arranging the data 

For SVD based analysis, the data have to be arranged into a 
matrix. Consider a process or data sequence: 

(x(.)> = (x(l), x(2),...}. 

If the series (x(.)> is periodic, with period length n, M 
consecutive n-long periods can be arranged into a matrix X, 
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such that the consecutive periods occupy the consecutive 
rows of X, as follows: 

’x(l) x(2) ... x(n) ‘ 

x(n+l) x{n+2) ... x(2n) 

X = I i . (7.7.1) 

x((M-l)n+l) x((M-l)n+2) ... x(Mn) 

SVD of the Mxn matrix X is given by 

X = USV T , (7.7.2) 

where U = [u! u M ], and V = Ivj v n ] are MxM and nxn 

orthogonal matrices respectively, and S is a diagonal 
matrix of singular values s 1 to s p> with Sj a ... a s p t 0, 
p = min (M,n). 

The left (u t etc.) and the right (v! etc.) singular 
vectors form a basis for the column-space and the row-space 
of X respectively. 

Characterization 

Consider the following types of periodic processes. 

Case 1 The series (x(.)> is strictly periodic with period 
length n, that is 

x(k) = x(k+n). 

Here, all the rows of X will be identical, and X will be a 
rank-one matrix. SVD of X will produce one nonzero singular 
value Sj, T all other singular values being zero. Hence 
X = u 1 s 1 Vj. v|, the first row of V T , will represent the 
pattern or the normalized distribution of the series over 
one period; The elements of the M-vector represent the 

scaling factors for each row of the data matrix X. 

Let 

UjSj = Zj — (zjj Zjj ... Zjj ...z M1 ] . (7.7.3) 

Thus the i-th row of X will be given by z n vj. Since <x(.)> 
is perfectly periodic, the M elements of z t will be 
identical. See Example 7.7(1). 

Case 2 The series (x(.)> is nearly periodic with fixed 
period length n but x(k) is not necessarily equal to 
x(k+n). 

There are two main possibilities: 

(a) The series <x(.)> has the same repeating pattern but 
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Figure 7.7.1 (a) A strictly periodic series, and 

(b) a nearly periodic series with repeating pattern. 


with different amplitudes over different periods. 

In such a case the rows of X will be linear functions 
of each other, and hence X will still be a rank-one matrix. 
Vi will represent the periodic pattern but the elements of 
Zj in (7.7.3) will be different. See Example 7.7(1). 

(b) The series (x(.)) has a nearly repeating pattern with 
different amplitudes over different periods. 

In such a case, X can be a full rank matrix. SVD of X 
will still produce one dominant singular value s x , the other 
singular values being insignificantly small: s 1 »s 2 . So 
the rank-one approximation, X “ UjSjVj will be permissible. 
The Airline traffic series discussed in Example 7.7(2) 
belongs to this category. 


Example 7.7(1) SVD characterization of two repeating 
periodic series 

Fig.7.7.1(a) shows a strictly periodic series (x(.)>, and 
Fig.7.7.1(b) shows a periodic series (y(.)) with a repeating 
pattern, which is differently scaled over different periods. 
The 4 consecutive periods of (x(.)> and <y(.)> are arranged 
into matrices A and B respectively, and are singular value 
decomposed as follows. 


3.00 2.00 5.00 3.25 2.75 2.25 
_ 3.00 2.00 5.00 3.25 2.75 2.25 

3.00 2.00 5.00 3.25 2.75 2.25 

3.00 2.00 5.00 3.25 2.75 2.25 

A = U a S a V a t , where 

S A - diag (15.6445 0 0 0:0), 
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U A 


0.5000 -0.5000 -0.5000 -0.5000' 
0.5000 0.8333 -0.1667 -0.1667 
0.5000 -0.1667 0.8333 -0.1667 ’ 
0.5000 -0.1667 -0.1667 0.8333 


V A 


0.3835 

0.2557 

0.6392 

0.4155 

0.3516 

0.2876 


-0.9235 

0.1062 

0.2654 

0.1725 

0.1460 

0.1195 


0 

-0.6921 

0.6248 

-0.2439 

-0.2063 

-0.1688 


0 

-0.4499 

-0.2439 

0.8415 

-0.1341 

-0.1097 


0 

-0.3807 

-0.2063 

-0.1341 

0.8865 

-0.0929 


0 

-0.3115 
-0.1688 
-0. 1097 
-0.0929 
0.9240 


Again 

'1.50 1.00 2.50 1.625 1.375 1.125 
2.40 1.60 4.00 2.600 2.200 1.800 
“ 2.70 1.80 4.50 2.925 2.475 2.025 
1.80 1.20 3.00 1.950 1.650 1.350 

T 

B = U B S B V B , where 

S B = diag {11.2270 0 0 0:0>, 




0.3484 

0.5574 

0.6271 

0.4180 


-0.9326 

0.2807 

0.1640 

0.1569 


-0.0837 

-0.7795 

0.4661 

0.4099 


-0.0440 

-0.0534 

0.6022 

-0.7954 


V B 


0.3835 

0.2557 

0.6392 

0.4155 

0.3516 

0.2876 


-0.6286 

0.1344 

0.5996 

-0.2935 

-0.3185 

0.1993 


0.6693 

-0.1350 

0.2495 

-0.5226 

-0.4445 

-0.0286 


-0.0236 

0.3351 

0.1614 

-0.4550 

0.5277 

-0.6129 


0.0963 

0.8791 

-0.3260 

-0.0493 

-0.2601 

0.2037 


0 

-0.1149 

-0.1930 

-0.5086 

0.4809 

0.6780 


Note that A and B are rank-one matrices, which also shows 
from both S A and S B having one nonzero singular value. Since 
{x(.)> is strictly periodic, all elements of T the first 
column of U A are the same. The first row of V B gives the 
pattern of ?y(.)>, which is scaled by the elements of the 
vector u B1 s B1 , where u B1 is the first column of U B , and s B1 
is the first singular value of S B (=11.2270). 
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Summary 

If a periodic series <x(. )> is arranged into a matrix X, 

with the periods aligned into the rows of X, and if 
X = USV , 

T 

(i) v t represents the periodic pattern of <x(.)>^ and the 

elements of = [z u z 2 i ... z n ...z M1 ] , will be 

the scaling factors. 

(ii) if {x(.)> has a strictly repeating pattern, only s 1 is 

nonzero while s 2 = ...= s p = 0; the i-th row of X will be 

given by z u v[. 

(iii) if {x(.)> is periodic with a nearly repeating pattern, 

s 1 »s 2 ; the i-th row of X will be given by ZjjVj through 
rank one approximation of X, that is X = USV « u 1 s 1 Vi. 

Degree of periodicity 

The total energy in X is given by 

Qx = JJUu) 2 = »AJ1? = sf ♦ s 2 z +...+ sj. 

The degree of periodicity depends on the percentage of the 
total energy contained in the most dominant decomposition 
component (that is s*). So T, the degree of periodicity, may 
be given by 

T = Sj/s 2 . (7.7.4) 

An alternative expression for periodicity can be 



For a perfectly periodic series, T is » while T' = 1. The 
closeness of a nearly periodic series to the rank-one 
approximated series can be assessed using (7.7.4) and 

(7.7.5). Assessment of the degree of periodicity is central 
to the concept of periodic decomposition discussed in 
Sec.11.5 and the SVR spectrum discussed in Appendix 11. 

Remark 

The term ‘nearly periodic’, in the present context, refers 
to deviation from periodicity in terms of either the 

pattern, the period length, or the magnitudes over the 

periods. The SVD based characterization of nearness to 

periodicity is an assessment in terms of rank-oneness of the 
appropriately configured data matrix. In mathematics, a 
closely related term is ‘almost periodic functions’, which 
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concerns the period lengths being almost the same; such 
functions have been rigorously treated in Corduneanu (1968). 


Example 7.7(2) SVD analysis of nearly periodic series with 
fixed periodicity 

Consider SVD analysis of 

(i) the Trans-Atlantic Airline Passenger series, and 

(ii) the Homogeneous Indian Rainfall series. 

Both the series contain monthly data and are nearly periodic 
with (fixed) yearly periodicity. 

For the Trans- Atlantic Airline Passenger series shown 
in Fig.4.3.3, a 4x12 moving window, say A(k), is considered 
moving over a 12x12 data set X, containing monthly data for 
1949 to 1960 (see Appendix 7A.2); so A(l) will be composed 
of the first 4 rows of X, A(2) will be composed of 2nd to 
5th rows of X and so on. The singular values are computed 
for each A(k), for k = 1 to 6. Both the data and the 
singular values are given in Appendix 7B.2. The progressive 
distribution of the singular values of (A(k)), normalized by 
s 1 for each window k, is shown in Fig. 7. 7. 2. 

A similar exercise is performed on the homogeneous 

Indian Rainfall series (Appendix 7F) shown in Fig. 2. 2.1. The 
data from 1940 to 1959 are used here; a 10x12 A(k) is 
considered in this case. The distribution of the singular 

values (normalized by Sj) obtained from the SVD of A(k), 
for k = 1 to 6, is shown in Fig.7.7.3. 

For the Airline Passenger series, a sharp edge in Fig. 
7.7.2 indicates the presence of one dominant singular value 
for the different windows <A(k)} over the data series, 
whereas the distribution of the singular values is not that 
sharp for the rainfall series, which indicates that the 

degree of repetitiveness of the periods in the rainf all 

series to be comparatively low. 

Similar analysis applied to the quasiperiodic processes 
appears in Sec. 11. 2. 


7.8 MODELLING AND PREDICTION USING SVD 

It is assumed that the periodic data (x(.)> are arranged 
into an Mxn matrix X as discussed in the last section. The 
objective is to model the series (x(.)> and to produce 
one-period ahead prediction, that is predict the (M+l)th 
row of X. 
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Figure 7.7.2 The progressive distribution of norma- 
lized singular values for the Airline traffic series. 



Figure 7.7.3 The progressive distribution of norma- 
lized singular values for the Rainfall series. 
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7.8.1 Principle of Modelling 

Consider an mxn data window moving (downwards) over X (where 
m<M); let the so formed matrix sequence be (A(i)h The k-th 
data window forms A(k), which is composed of the last m rows 
of X. For each value of k, A(k) is singular value decom- 
posed, and the series (x(.)> is modelled in terms of the so 
obtained SVD components. 

Let the SVD of A(i) be given by 

A(i) = U(i)S(i)V T (i). 

Periodic prediction can be computed as f ollows. Let 
A(l),...,A(k) be the consecutive mxn data windows of X. The 
singular value decomposition of A(k) is performed for each 
value of k. 

The modelling and prediction policy is based on two 
main assumptions: 

(i) only the first singular value of A(k) is predominant, 
for any k, and 

(ii) Vj(k) can be assumed to remain almost unchanged between 
the two consecutive positions A(k-l) and A(k). 

The case, when more than one singular value is dominant, is 
discussed in the Remarks later in this section. 

Define 

z m i(k) = u^kjs^k). (7.8.1) 

The propagation of the sequence (z ml (k)> can be represented 
by any suitable discrete-time model. Typically an AR model 
may be considered: 

F(q -1 )z IBl (k) = e(k), (7.8.2) 

where 

F(q _1 ) = 1 + fjq -1 + f 2 q' 2 + ... + f„q" N , 

and (e(k)> is a sequence of uncorrelated noise. Jhe model 
(7.8.2) may be used to produce the prediction z ml (k+l|k); 
the consequent the prediction of one complete period 
following A(k) is given by 

a<m + i>i(k+l|k) - Zmitk+l | k)vj(k). (7.8.3) 

Summary 

(1) Form Mxn matrix X from M consecutive periods each of 
length n. 

(2) Choose a moving data window size mxn for A(k). 

(3) Starting from the top of X move the data window A(k) 
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over X, one row down at a time; for each k, perform SVD 
of A(k) and store z^ik) = u ml (k)s 1 (i). 

(4) Model (z^k)} process (7.8.2) and produce one-step 
prediction z^k+ljk). 

(5) Compute one-period prediction: z^fk+l | k)Vi(k). 

Remarks 

(1) The assumption that the first singular value is dominant 
is not a limitation. If the second singular value is also 
dominant, and if v 2 (k) remains reasonably unchanged between 
two successive windows k and k+1, the same procedure may be 
applied to predict z m2 (k+l| k)v 2 (k); the overall prediction 
will be z^Ck+lIkJvjtk) + z m2 (k+l|k)v 2 (k). Thus any number 
of additive components may be accommodated in the 
prediction. 

(2) The choice of the data window length m is restricted by 
the amount and the nature of data available; the highest 
value can be typically <(M-6); the smallest value should be 
less than or equal to the number of periods over which the 
dynamics of the process remains relatively steady. 


7.8.2 Case Study: Periodic Prediction of Airline Traffic 

The Airline traffic series has been discussed in Example 
7.7(2) in Sec.7.7. The objective is to produce periodic 
(i.e. yearly or one- to 12-step ahead) prediction of this 
monthly data series and to study the robustness of the 
prediction procedure. 

Prediction 

Here n = 12, and m is chosen to be 4. The data for the first 
10 years are used to produce the prediction for 11th year, 
and the data up to the 11th year are used to predict the 
12th year. The data are (natural- ) logarithmically 
transformed before performing SVD to reduce the variance and 
periodic variability in the data. 

The singular values for A(l) to A(7) show that s t is 
dominant. The overlaid plot of Vj(l) to Vj(7) in Fig.7.8.1 
shows v t (k) remaining reasonably steady over the different 
data windows. So the two basic assumptions for the present 
method of modelling as stated in Sec. 7. 8.1 are valid. 

The first 10 periods yield only 7 data points for 
(k)> sequence which is modelled as 
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Figure 7.8.1 Overlaid plot of v^l) to v t (7) for log- 
transformed Airline traffic series. 



z^Oc) * f o + fiZ^ik-l) + e(k), m =4. (7.8.4) 

where the least squares estimates for f 0 and f x work out as 
f 0 = 2.197 and f t = 0.965. ^(11 1 IOJvjUO), the A so produced 
prediction is antilog transformed to produce a ml (ll|lO). The 
procedure is repeated to produce 3^(12)11). The prediction 
results are shown in Fig.7.8.2; the MSE per sample is found 
to be 354.28 for the predicted series. 

Remarks : Since the present method is based on the decom- 
position components corresponding to the most dominant 
singular value, it can be used even if the data are 

contaminated with noise; this is because aperiodic noise 
will be largely associated with the smaller singular values. 

Robustness 

The robustness of the modelling and prediction procedure 

against additive white Gaussian noise is now considered; the 
performance of the SVD based method is compared with that of 
a multiplicative model. 

The log-transformed Airline traffic series has been 

modelled by Box and Jenkins (BJ) (1976, p.306) as 

z(k) = (l-e iq _1 )(l-e 2 q' 2 )e(k), (7.8.5) 

where the estimated parameters of this multiplicative model 
are stated to be 0 t = 0.4 and 0 2 = 0.6. 

The noise is added to the untransformed data. The 

parameters of (7.8.5) are estimated with 20 different levels 
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Figure 7.8.2 One- to 24-step ahead prediction of the 
Airline traffic series for the 11th and the 12th years. 


of noise by finding the minimum of the error sum of squares 
surface. For each noise level 1000 different noise patterns 
are tried for each of which the parameters are estimated; 
the averaged parameter values are used for computing 
prediction. The MSE for one- to 12-step ahead prediction by 
both the methods are computed for the 11th and the 12th 
years. As shown in Fig.7.8.3, while at high signal to noise 
ratio (SNR), the performances of the SVD based method and 


500000 

400000 

MSE 300000 
per 

sample 200000 
100000 
0 

0 5 10 15 20 

SNR in dB 

Figure 7.8.3 MSE vs. signal to noise ratio for periodic 

prediction using SVD based and BJ method ( ) on 

Air traffic series with added white Gaussian noise. 
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the BJ method are comparable, at low SNRs the SVD based 
method performs much better; the reason is that the effect 
of noise is largely confined to the smaller singular values 
which are ignored in the SVD based method. 


7.9 CONCLUSIONS 

The property of compaction of information through orthogonal 
transformation has been used for modelling and prediction of 
nearly periodic series. 

In this chapter, first the subject of orthogonal 
transformation was introduced and the optimal transformation 
through KLT was presented. Next the Walsh-Hadamard transform 
(WHT) was discussed. The rest and the major part of this 
chapter has been devoted to the study of singular value 
decomposition (SVD). Both WHT and SVD offer efficient 
procedures f or the modelling and prediction of nearly 
periodic processes. Particularly the SVD based approach is 
very powerful and numerically stable. 

The main attraction of WHT is that it is computa- 
tionally simple and is even hardware implementable. WHT 
operates on vectors rather than matrices which is also an 
advantage. The transformed components are modelled, rather 
than the time domain data. However, the degree of compaction 
of information being much less compared with SVD, more data 
are required for sensible modelling. 

SVD has been widely used in the present text both for 
modelling and analysis (in Chapters 3, 7, 8, 9, 10, 11 and 
14) as well as a tool for matrix operations (in Chapters 3, 
and 12). Following an introduction to SVD and its 
properties, the analysis and modelling of nearly periodic 

series using SVD has been detailed in this chapter. 

From numerical point of view, the two attractive 

features of SVD are (i) that it can be computed in a 

numerically stable way, and (ii) that the singular values 

are well conditioned. SVD offers the most robust and 
definitive method for the determination of the rank of a 

matrix. For the modelling of periodic signals, the special 
structural feature of SVD can be made use of. The data are 
configured into a matrix with the consecutive periods 

occupying the consecutive rows of the matrix. The near 
rank-oneness of the matrix reveals the periodic nature of 

the data series. The series is modelled through the linear 

modelling of the decomposition components, which contain 
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maximally compressed information about the periodic data. 

Besides the numerical robustness, the proposed method 
of modelling and prediction is applicable even if the data 
are noisy and the period length as well as the periodic 
pattern varies to certain extent. The present discussions 
have been confined to processes with fixed period length 
only; processes with varying period length are studied in 
Chapter 11. 

It should be possible to automate the modelling and 
prediction procedures using both SVD as well as WHT, as 
operator expertise is not a prerequisite. 
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CHAPTER 8 


MODELLING OF NONLINEAR PROCESSES: AN INTRODUCTION 


Certain special features characterize a nonlinear 
process which can be represented by a single-stage 
or a multistage model, linear or nonlinear in the 
parameters. 


8.1 INTRODUCTION 

Most real-life processes are nonlinear to varying extents. A 
nonlinear process may be simply a time series, for example, 
the yearly averaged Sunspot activity process in astronomy 
(Fig. 8. 2. 4) or the river flow discharge in hydrology; or, it 
may be a complex process with many independent inputs, whose 
influences on the output are imprecisely known, for example, 
the economic inflation process, or the quality of molten 
iron while being tapped from the blast furnace. Since 
nonlinearity can be of various types and time varying, and 
also since the data may not be sufficient or adequately 
informative, both structure selection and parameter 
estimation for a real-life nonlinear process can be quite 
complex. 

There are many approaches to nonlinear system modelling 
(Billings, 1980, Haber and Unbehauen, 1990), each trying to 
capture the nonlinear characteristics through one or more 
nonlinear functional blocks; modelling usually involves 
local or piece-wise linearization, otherwise nonlinear 
optimization is performed. The suitability of any method of 
modelling and identification largely depends on the nature 
of the problem and the amount of inf ormation available on 
the process. The cross validation tests (Sec.3.6.4) and the 
quality of prediction obtained through the model will 
demonstrate how closely the model represents the underlying 
process. The present chapter is devoted to the study of 
(a) the characteristic features of processes with 
nonlinearity, and (b) the problems concerning modelling of 
such processes; a summary of some selected classes of 
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nonlinear models is also presented. 

The organization of this chapter is as follows. Some 
basic concepts related to nonlinear processes and their 
modelling are discussed in Sec. 8. 2. The special features of 
processes with nonlinear periodicity are studied in Sec. 8. 3; 
a comparative analysis using the state-space diagrams, the 
singular value decomposition (SVD) based characterization 
and frequency domain decomposition is also presented. 
Finally a summary of some selected classes of well-studied 
models is given in Sec. 8. 4. 


8.2 BASICS OF NONLINEAR PROCESSES 
8.2.1 Characteristic Features 

Features particular to nonlinear processes are as follows. 

(a) Algebraic manifestation : 

A basic manifestation of nonlinearity is in the principle of 
superposition not being valid. 

(b) Linearity/nonlinearity in the parameters 

A nonlinear process may be represented by a linear-in-the- 
parameter model: 

/(x,a) = a D + a t x + a 2 x 2 + ... + a m x m , (8.2.1) 

whereas a nonlinear model is nonlinear in the parameters; 
for example, 

/(x,0) ■ 0 O + 0JX + e 2 e x03 . . (8.2.2) 

Nonlinearity of a nonlinear model is manif ested in the 
dependence of df/89 l on any of the parameters, 0 1 . Thus 
whether a model is linear or nonlinear depends on how 
parameters enter the model, rather than how the variables 
enter the model. 

(c) Stationarity and initial condition 

Visibly, stationarity implies no growth or decline in the 
data with time. A stationary time series shows almost 
constant mean and variance. The autocorrelation function of 
a stationary series quickly drops close to zero f rom the 
maximum, typically after the second or third time lag, 
whereas the decrease is slow for a nonstationary process. 

For a linear process, stationarity is a global 
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property, i.e. the process is either stationary or non- 
stationary over all the space and time. On the other hand, 
a nonlinear process is at the most locally stationary. 
Besides, unlike the linear process, stability of a nonlinear 
process is dependent on initial conditions. 

(d) Irreversibility 

Irreversibility of time series data ref ers to the 
probabilistic features not being retained upon reversal of 
the direction of time. Nonlinear processes are distinctly 
time-irreversible. A corollary to time-irreversibility is 
the fact that for a nonlinear system, the output to a normal 
input is not normal, whereas for a linear system the output 
is also normal. 

(e) Invertibility 

If y(t) is the output of a process driven by an unobservable 
noise process e(t) then the model relating y(t) with e(t) is 
invertible, if the estimates e(t) of e(t)^can be produced 
from y(t) such that the error, e(t) ■ e(t)-e(t), tends to 
zero in some statistical sense as the number of observations 
y(t) tends to infinity; thus invertibility implies 
reconstructibility of the input from the output. Nonlinear 
processes can only be conditionally invertible. 

(f) Periodicity 

If a time series tends to repeat at a certain time interval, 
it is ref erred to as a periodic signal or a periodic 
process. The three basic attributes of periodicity are the 
period-length, the pattern over a period and the magnitude 
of the pattern; if these features remain unchanged, the 
process is called perf ectly periodic, otherwise the process 
will be aperiodic which can be quasiperiodic or chaotic. A 
periodic process can be expressed by a linear model, whereas 
aperiodicity is akin to nonlinearity. Further discussions on 
nonlinear periodicity follow in Sec. 8. 3. 

(g) Steady-state response 

Response of a stable system has two components: a transient 
and a steady-state component, the latter referring to 
asymptotic response (i.e. response as time tends to 
infinity). The transient component is expected to subside 
with time, leaving only the steady-state component. If the 
system is linear, the steady-state component will be either 
constant or periodic. If the system is nonlinear, the 
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steady-state response will be bounded but can be 

periodic quasiperiodic or chaotic in nature. 

8.2.2 Basic Models 

Some basic representations of nonlinear processes are 

discussed here. 

Polynomial models 

A large class of nonlinear processes can be expressed by 
polynomial or curvilinear models. 

For processes with one independent variable x, a 
typical model can be: 

y = ao + a t x + a 2 x 2 + ... + a,^" 1 + £, (8.2.3) 

where £ is the unmeasurable noise component, and m is the 
order of the model. For example, the world population may be 
modelled based on observations from 1920 to 1992 as 

y(t) = 2022.98 - 17.301t + 0.547t 2 + 0.555xl0~V + £, 

(y in millions, and t=year-1900). 

If two independent variables are used, a third-order model 
may be stated as 

y = ao + a^t + a 2 x 2 + a^ + b x x 2 + b 2 x 2 + b 3 x 2 

2 2 

+ c 1 x 1 x 2 + c 2 XjX 2 + c 3 x 1 x 2 + (8.2.4) 

where both the power terms and the cross product terms are 
considered. 

Note that the models (8.2.3 - 8.2.4) are linear-in-the 
parameter models which is an advantage. The explosive 
tendency of square and high-power terms in the polynomial 
models can be a drawback. One of the ways to alleviate this 
problem is to consider different models based on some 
threshold criteria on the magnitude of some specific 
variables, similar to the threshold models (see Remark, 
Sec.8.4.3). 

It is always desirable to use a polynomial with the 
minimum number of independent variables or regressors, to 
model the process representatively. There is no universal 
method available for selection of the optimum set of 
variables in all types of models. Subset selection 
(Secs. 3. 6. 2-3. 6. 4) may be used to eliminate redundancy in 
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Nonl tnear System 


Figure 8.2.1 Hammerstein model of a nonlinear system. 


the set of independent variables. For an example of 
selection of an optimal nonlinear AR model, see Sec.9.4. 

Hammerstein model 

This model (Fig. 8. 2.1) represents a nonlinear system by a 
static, zero-memory nonlinear system followed by a system 
with linear dynamics. If the nonlinearity /„(.) is known, 
the model can be considered to be a linear model with the 
input(s) (u(t)> transformed to (/ M (u(t))>. If the nonlinea- 
rity is unknown, it may be expressed by a polynomial model; 
for example, 

r(t) ■ /„(u(t)) 

= a t u(t) + a 2 u 2 (t) + ... + OuU^t). (8.2.5) 

If the linear system is modelled as 

A(q -1 )y(k) = B(q _1 )r(k-d), (8.2.6) 

where 

A(q _1 ) = 1 + ajq" + ... + a^q"", 

B(q _1 ) = b 0 + b^' 1 + ... + b n q" n , 

and d is the discrete time-delay between y(k) and r(k), the 
overall model response is given by 

y(k) = (1-A(q _1 ))y(k) + b 0 (/ H (u(k-d))) + ^(/^ulk-d-l))) 

+ ... + b n (/ H (u(k-d-n))). (8.2.7) 

The parameters of (8.2.7) can be estimated as in case of a 
linear model. The model (8.2.7) is known as the Hammerstein 
model. The measurement y(k) in (8.2.6) may be noisy. 

Wiener model 

A basic Wiener model (Fig. 8. 2. 2) consists of a system with 
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Nonl inear System 


Figure 8.2.2 A basic Wiener model. 

linear dynamics followed by a zero-memory system with static 
nonlinearity. A more complete Wiener model is shown in Fig. 
8.2.3, which has three distinct sections. The input u(t) is 
fed to a series of linear dynamical systems, hj(t), which 
are orthogonal functions. The next section is a multi-input 
multi-output nonlinear system G(.) with no memory, the 
weighted sums of the outputs of which yield the single 
output y(t). Wiener had considered (Wiener, 1958) Laguerre 
functions for h t (t) and Hermite polynomials for G(.); 
excessive computational requirement of the generalized 
Wiener model led to the development of many simplified 
representations as discussed in Schetzen (1980). 



Figure 8.2.3 A general model of nonlinear systems of 
the Wiener class. 

Nonlinear regression 

Nonlinear regression is used to estimate the parameters of a 
nonlinear-in-parameters model. The estimation problem can be 
formulated as a nonlinear optimization problem which can be 
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solved using standard algorithms, e.g., Gauss-Newton algori- 
thm. Another approach can be linear extrapolation followed 
by iterative solution. Details of nonlinear regression are 
beyond the scope of this book; the reader may refer to 
Draper and Smith (1981), Bard (1974). 


8.2.3 Nonlinear Transformation 

The prime objective of nonlinear transformation is to expand 
the operating region, over which a simplified model of the 
underlying nonlinear process will be valid. In other words, 
the purpose of nonlinear transf ormation is to prepare the 
data for familiar methods of analysis, which cannot other- 
wise be used. 

Nonlinear transformation can be performed both for time 
series data and input-output data. The usual practice is to 
transf orm either the dependent variable or the independent 
variable! s) or both. The graphical display of the dependent 
and the independent variables can help decide which 
variables need transformation. Subjective knowledge of the 
underlying process, if available, deserves primary 
consideration in deciding the type of transformation. 

The two basic objectives of nonlinear transformation 
are linearization and variance equalization. 

Linearization is performed when the time series shows a 
nonlinear trend or when a nonlinear relationship is observed 
between the dependent (or the response) variable and the 
independent (or the explanatory) variables. Some basic 
transformations z of y for linearization are as follows 

z ■ y 2 , y, Vy, log y, l/Vy, 1/y, 1/y 2 l/y n , 

n being a positive integer. In general f or multiplicative 
nonlinearity, logarithmic transformation is used. 

Equalization of variance is necessary when the data 
show nonuniform spread or variability. For example, the 
variability in the Sunspot series (Appendix 8A) is much 
reduced when logarithmically transformed, as shown in 
Fig. 8. 2. 4. 

Some selected cases are mentioned below where particular 
transf ormations result In constant variance subject to some 
specific statistical relationships. 
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Figure 8.2.4 (a) The yearly averaged sunspot series 

iy(k)>, and (b) the transformed sunspot series {z(k)> 
where z(k) = log e (y(k)+10). 

(a) Square-root transformation is used when variance is 
proportional to the expected value (i.e. z = Vy, if 
Var(y) = £(y)). 

(b) Logarithmic transformation is used when standard 

deviation is proportional to expected value (for exam- 
ple, z = log y, if <r 2 (y) = E( y)). 

(c) Reciprocal transformation is used when standard 

deviation is proportional to the square of the expected 
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value (i.e. z = 1/y, if <r 2 (y) = (£( y)) 2 ). 

One of the widely used and studied transf ormations is Box 
and Cox transf ormation (Box and Cox, 1964): 

yU) = (y A -l)/X, X * 0, 

y(0) = log y, 

where the real unknown parameter X is estimated by maximum 
likelihood method so the transformed data (y(X)> are 
independent and normally distributed with constant variance. 
See Atkinson (1985) for detailed discussions along with 
modifications. In such types of transf ormations where the 
parameters of transf ormation are estimated from the data, it 
is necessary to eliminate outliers, as otherwise the 
transformations can be adversely influenced. 

Since saturation is inevitable with all real-life 
variables, sigmoidal type nonlinearity is also often used 
for transformation in nonlinear modelling. Sigmoidal 
nonlinearity is particularly popular with Neural Networks 
(see Sec. 10. 2). 

When a linear model is used with the transformed data, 
it must be remembered that the disturbance term has also 
been transformed. Since linear regression requires 

(a) additivity of the dependent terms, 

(b) constancy of variance, and 

(c) normal distribution of the data; 

the degree of validity of these conditions needs to be 
reassessed when transformed data are used. 


8.3 NONLINEAR PERIODICITY 

Data series or signals with certain degrees of periodicity 
broadly fall into three categories: (i) periodic, (ii) 

quasiperiodic, (iii) chaotic series. In this section, the 

features of these classes of series are stated, and then 
their analysis is presented. 


8.3.1 Periodic, Quasiperiodic and Chaotic Series 
Periodic series 

A data series is said to be strictly periodic if 
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x(t) = x(t+T), 

for all t where the time period TOO) remains unchanged. The 
process will be nearly periodic if 

x(t) “ x(t+T), 

Periodic series are discussed in Sec. 7. 7.1. 

The frequency spectrum of a periodic series will 
comprise the fundamental component, f = 1/T, with the 
harmonic components at f n = n/T, n = 2,3,... etc. 

Remarks 

(a) A periodic series shows a repeating pattern, the 
repetition frequency being the same as the periodicity. 

(b) A periodic series may be generated from a linear 
combination of a number of sinusoids; the frequency 
component corresponding to the period length of the 
composite series will have the maximum magnitude. 

Quasiperiodic series 

A quasiperiodic series is a linear combination of two or 
more periodic series each of whose frequencies is a linear 
combination of a finite set of frequencies, at least two of 
which erne incommensurate. 

Remarks 

(a) In the present context, the introduction of the follow- 
ing terms is in order. 

(1) A number which can be expressed as a ratio of two 
integers is called a rational number. 

(2) For a discrete-time series the normalized frequency 
(i.e. normalized with respect to the sampling 
frequency) has to be rational to be periodic. 

(3) Two frequencies are called incommensurate if their 
ratio is not a rational number. 

(b) If frequencies of at least two components are not 

incommensurate as stated above, the series can be locally 
quasiperiodic but globally periodic. nQ 

Consider the quasiperiodic series given by 

x(t) = I gj(t), 

j=l 

where fj, the frequency of gj, is a linear function of a set 
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of linearly independent base frequencies, r 1( r 2 ,..., r H : 

fj = a^r^ + a 2 r 2 + ... + a N r N , 

a t ,..., a N being integers. 

The base frequencies are not uniquely defined but the 
number of base frequencies, N, is characteristic of a 
quasiperiodic series, which is also called N-periodic. A 
periodic series is a quasiperiodic series with N=l. The 
frequency spectrum of an N-periodic series shows N sets of 
harmonics at discrete frequencies. For g 1( the harmonics are 
at fj, 2fj, 3f 4 , . . . etc. A quasiperiodic series can also be 
generated through nonlinear interaction of two or more 
periodic series. 

The yearly averaged Sunspot activity series (Fig. 
8.2.4(a)) is a typical example of a quasiperiodic series. 

The state-space diagram (say, x(k-l) vs. x(k) plot) of a 
quasiperiodic series is distinctly different from that of a 
periodic or a chaotic series, as shown in Example 8.3.2. 

Chaotic series 

A chaotic series shows the f ollowing distinguishing 

f eatures: 

(i) a continuous frequency spectrum, and 

(ii) sensitivity to the initial condition. 

A chaotic series is bounded in magnitude. It comprises a 

broad band of frequencies which manifests in noise like a 

continuous frequency spectrum. Due to the characteristic 
sensitivity to the initial condition, long-term prediction 
is not possible for a chaotic series, because irrespective 
of closely defined initial conditions the series and the 
predictor trajectories may increasingly diverge with time 
and eventually be uncorrelated with each other. 

The oscillations in physiological series can be 

modelled as 

x(k+l) - x(k) = ax * k ~ T) 0x(k), (8.3.1) 

1 + x y (k-T) 

which is the discrete-time representation of the Mackey 

Glass equation (Mackey and Glass, 1977). With suitable 
choices for the parameters a, 8, y and r, (8.3.1) can model 
periodic, quasiperiodic or chaotic processes, as discussed 
in Example 8.3.2. A typical chaotic series generated from 
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the Mackey Glass equation is shown in Fig. 8.3.4(a). 


8.3.2 Analysis using State-space Diagrams, SVD and FFT 

The modelling and characterization of data series or 
processes with periodicity using SVD and FFT has been 
discussed in Sec.7.7 and Sec. 2.5 respectively; the 
characteristic f eatures can also be revealed through the 

state-space diagrams. 

State-space diagrams 

The x(k) vs. x(k-i), ial, diagrams of any sequence (or 

process) (x(k)> are referred to as the state-space diagrams 
in two dimensional space; state-space diagrams can also be 
formed in higher dimensional spaces. 

A process with periodicity generates at least one 
closed contour in the state-space diagram. These closed 
contours are also called limit cycles. Some basic facts 
about these closed contours are as follows. 

(a) If the sequence {x(k)> is strictly periodic, a 

repeating closed contour will be generated in the state- 
space diagram. 

(b) If the sequence {x(k)> is sinusoidal, the closed 

contour will be elliptical in shape. If the sequence is 
periodic but contains more than one sinusoidal component, 

the closed contour will be repeating but will deviate from 
the elliptical shape. 

(c) In real life, periodic processes are rarely strictly 
periodic. For example, consider the atmospheric C0 2 concen- 
tration series (Appendix 7C). The series has a trend, and is 
nearly periodic with yearly periodicity (see Figs. 8.3.1(a) 
and 8.3.1(b)). Such processes will generate closely placed 
closed contours (see Fig.8.3.1(b)). Note that the 
state-space diagram can be meaningful when the data series 
is detrended, as otherwise the contours move with a trend 
(as in Fig.8.3.1(a)) making the analysis difficult. 

(d) Processes with periodicity are generally bounded. 
However, any tendency of unboundedness of a sequence is 
explicitly revealed in the state-space diagram, which shows 
gradually growing contours. 
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(c) (d) 

Figure 8.3.1 (a) The atmospheric C0 2 concentration series, 
showing monthly C0 2 cone, in ppm over the years 1959-80. 

(b) the detrended C0 2 concentration series, 

(c) state-space diagram for C0 2 concentration series, 

(d) state-space diagram for the detrended C0 2 series. 

Remark : The state-space diagram being a simple and concise 
representation, it may be possible to demonstrate the 
characteristic pattern or abnormality in any periodic or 
quasiperiodic process through state-space diagrams. 

Comparative analysis 
(a) Periodic processes: 

The Fourier transform will produce components with respect 
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to distinct sinusoidal components present in the signal. The 
SVD will produce one dominant singular value when the 
repeating or nearly repeating patterns (or periods) are 
aligned into consecutive rows of a matrix on which SVD is 
performed. As long as the sequence repeats or nearly 
repeats after a certain period length, there will be one 
dominant singular value. Thus if the repeating pattern 
comprises a number of sinusoidal components, there will be 
one singular value but as many Fourier components. 

For a periodic process, the state-space diagram will 
show at least one closed contour per period. There are some 
basic differences between the state-space diagram analysis 
and the SVD based analysis as follows. 

(i) There is no need for aligning the data in state-space 
analysis, as required in the case of SVD based analysis 
(see Sec.7.7 and Sec.11.2.1). 

(ii) Within the periods, if there is any further explicit 
periodicity, the state-space diagram will show a 
smaller closed contour corresponding to each of such 
periods within the main period. Whether the smaller 
contours are discernible or not depends on the nature 
of the concerned periods. In the case of SVD, detection 
of smaller periods will be difficult. 

(b) Quasiperiodic processes: 

A quasiperiodic process is expected to have periods of 
varying length. Since f or SVD analysis, the consecutive 
periods have to be aligned into the rows of a matrix, in the 
case of quasiperiodic processes either some interpolated 
data are to be used, or some data have to be sacrificed as 
discussed in Sec.11.2.1. There will be one dominant singular 
value, but the other singular values will not be insigni- 
ficantly small, unlike the periodic case. 

In the case of the state-space diagram analysis, if a 
quasiperiodic process is generated from a linear combination 
of p periodic processes, there will be at least q number of 
closed contours, where 1 s q s p, if harmonic relations 
exist between the component processes. In case of nonharmo- 
nic components, the composite process will show at least p 
closed contours. 

(c) Chaotic processes: 

Chaotic processes can be analysed using the state-space 
diagram and the Fourier transform; no meaningful analysis is 
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possible using SVD through conventional approaches. 

The state-space diagram of a chaotic process shows a 
large number of closed contours with no apparent order of 
arrangement. The contours will be nonintersecting in three 
dimensional or larger dimensional spaces. The difference 
between a quasiperiodic process and a chaotic process can be 
difficult to distinguish from the state-space diagram alone, 
and spectral analysis can be of use in such cases. The 
chaotic process will show the presence of an almost continu- 
ous, broad band, noise-like frequency spectrum, whereas the 
frequency spectrum for a quasiperiodic process will have one 
or more nearly repeating bands; in both cases the amplitude 
will decay for larger frequencies. 

Remarks 

(1) Quasiperiodic series tend to show gradual phase shift if 
a periodic frame is used to study its dynamics; the closed 
contours on state-space diagrams serve as such frame. They 
are less rigid than the row (or column) spaces in SVD and 
can pictorially exhibit the gradual phase shift by tracing 
successive contours that slip away from previous ones by 
incremental amounts. 

(2) A periodic process loses its periodic property on being 
sampled at a rate which is not a rational submultiple of its 
periodicity. FFT will not reveal the periodicity with sharp 
peaks but state-space diagram will reveal the periodicity 
through a closed contour. A typical example is 

x(t) = sin (2nt/V2), 

sampled at t => 1,2,3,... 

Example 8.3.2 Produce the state-space diagrams for the 
data series generated by the Mackey-Glass equation (8.3.1). 

With reference to (8.3.1), consider the following cases. 

Case 1: with a = 0.2, p = 0.1, y - 10 and x = 6, a nearly 

periodic sequence results. 

Case 2: with x = 17, and the other parameters left as in 
Case 1, a relatively quasiperiodic sequence is generated. 

Case 3: with a = 0.8, p = 0.4, y = 16 and x = 95, the 
sequence tends to be chaotic. 

The state-space diagrams and the corresponding Fourier 
transforms for the above mentioned cases are shown in 
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(b) (c) 


Figure 8.3.2 (a) The nearly periodic Mackey -Glass 
series (Case 1) with a=0.2, 0=0.1, 7=10, and t=6, 

(b) the state-space diagram , 

(c) the frequency spectrum showing primarily one 
frequency component. 

Fig.8.3.2, Fig.8.3.3 and Fig.8.3.4 respectively. 

Note that Case 1 shows primarily one f requency 
component being present. Case 2 is an example of a quasi- 
periodic series with primarily two different periodicities; 
the Fourier transform also confirms the presence of two 
prime bands of frequency components. The Fourier transform 
of Case 3 shows the presence of a wide range of frequency 
components, which confirms the process being chaotic. 
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Figure 8.3.3 (a) The quasiperiodic MG series (Case 2) 
generated with a=0.2, 0=0.1, y=10, and x=17, 

(b) the state-space diagram, 

(c) the frequency spectrum showing primarily two bands 
of frequencies. 


8.4 SELECTED NONLINEAR MODELS 

Some of the widely studied models which lend themselves to 
physical, biological and social sciences etc. are summarized 
here. A detailed study is beyond the scope of this book; 
only the underlying principles of natural representations of 
diverse dynamic processes are presented here. 
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(b) (c) 


Figure 8.3.4 (a) The chaotic Mackey -Glass series 
(Case 3) generated with a=0.2, 0=0.1, 7=16, and t=95, 

(b) the state-space diagram, 

(c) the frequency spectrum showing relatively 
widely spread frequency components. 


8.4.1 Bilinear Models 

The nonlinearity in a bilinear model is present as 
multiplicative terms of two process variables: 

y(k) + £ a t y(k-i) = £ b t w(k-i) + £ £ g hJ y(k-h)w(k-j), 

1=1 1=1 h=l J=i 
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Figure 8.4.1 The state-space representation of the 
bilinear model (8.4.2 - 8.4.3). 


where (y(k)) is the output sequence or a discrete-time 
series; (w(k)> is a sequence of input variables or equation 
error (or residual), which may also be a set of independent 
random variables; p, m and n are integers. 

A state-space representation of (8.4.1) is given by 

x(k+l) ■ Ax(k) + bw(k+l) + £ g]w(k-j)x(k), (8.4.2) 

y(k) = c T x(k), ” (8.4.3) 

where 



-a 1 -a 2 ... -a„_i -a„ 


V 


1 0 ... 0 0 


b 2 

A = 

0 1 ... 0 0 

, b = 

, 


0 0 ... 1 0 


bn. 


c T = [1 0 0 ... 0], g] = [gjj g 2J ... gpjl, 

for j = 1,2,..., m; and x is of size, max(n,p). 

Thus the bilinear process given by (8.4.2-8.4.3) and shown 
in (Fig.8.4.1) is linear either in the state x or in the 
input (or equation error) w but not in both. The process is 
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called 

(a) homogeneous in state, if b = 0, or 

(b) homogeneous in input (or equation error), if A = 0, or 

(c) strictly bilinear, if A = 0 and b = 0. 

Since the linear and the nonlinear parts of a bilinear model 
are not orthogonal to each other, it cannot be said that the 
nonlinear part of the model represents the nonlinear part of 
the process. 

Bilinear models can be f ormulated in continuous-time 
form or discrete-time form, which again can be stochastic or 
deterministic in nature. 

To identify a bilinear time series model, Subba Rao and 
Gabr (1984, p.176) first consider the best subset AR model 
(see Sec.3.6.3); appropriate bilinear terms are then intro- 
duced through parameter estimation using the Newton-Raphson 
method, aiming at the best model in terms of minimum value 
of AIC. 

Example 

The sunspot activity series (Appendix 8A), modelled on 221 
observations (i.e. for the years 1700 - 1920) is given by 
(Subba Rao and Gabr, 1984, p.197): 

y(k) - 1.5012y(k-l ) + 0.767y(k-2) - 0.1152y(k-9) 

= 6.886 - 0. 1458y(k-2)e(k-l) + 0.006312y(k-8)e(k-l ) 

- 0.007152y(k-l )e(k— 3) - 0.006047y(k-4)e(k-3) 

+ 0.003619y(k-l )e(k-6) + 0.004334y(k-2)e(k-4) 

+ 0.001782y(k-3)e(k-2) + e(k), (g 4 4) 

where (e(k)> is a random noise sequence; the normalized AIC 
= 4.927 . 


8.4.2 Threshold Models 

If a nonlinear system displays varying dynamic characteris- 
tics over different operating regimes, it may be difficult 
to represent the process by a single model. Threshold models 
can be particularly useful in such cases. The basic idea is 
to express the nonlinearity in terms of a number of linear 
or nonlinear models, each being valid within specific 
operating regimes, defined by the thresholds. 

Thresholds are numeric values. Threshold modelling 
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defines different submodels which are valid subject to 
crossing of the threshold by a lagged observation, or a 
function of lagged observations. Each submodel is a partial 
description of the underlying nonlinear process. If the 
submodels are linear, a criterion like AIC is used for 
model order selection and the parameters are estimated using 
the least squares method or the maximum likelihood method. 
Each submodel is treated separately. AIC can also be used to 
decide the variable on which the threshold is to be set, as 
well as the appropriate value of the threshold. For detailed 
study on threshold modelling refer to Tong (1983). 


Example 

The sunspot activity series (Appendix 8A), over the years 
1700 to 1920 is modelled by Tong and Lim (1980) as 


f 10.544 + 1.692y(k-l) - 1.1592y(k-2) + 0. 23674y(k-3) 
♦ 0.1503y(k-4) ♦ «,(»>. , f y(]t-3)s 36.6 


y(k) = 


V 


7.8041 + 0.7432y(k-l ) - 0.0409y(k-2) - 0.202y(k-3) 
+ 0. 173y(k-4) - 0.2266y(k-5) + 0.0189y(k-6) 

+ 0. 1612y(k-7) - 0.256y(k-8) + 0.319y(k-9) 

- 0.3891y(k-10) + 0.4306y(k-ll) - 0.03974y(k-12) 

+ ®2 (k) * if y(k-3) s 36.6, 


(8.4.5) 


where ej and e 2 are white noise sequences. The normalized 
AIC for this model is 5.00. 

Nonlinear threshold AR models have been studied by 
Ozaki (1981, 1985). The basic characteristic of such models 
is that at least one submodel is expressed as a nonlinear 
function having one or more parameters, which are themselves 
nonlinear functions of the amplitude. A generalized 
nonlinear AR model is given by 


y(k) 


a t y(k-l) +...+ any(k-n) + e^k), if |y(k-l)| s C, 

f t (y(k-l)) + ... + f n (y(k-l))y(k-n) + e 2 (k), 

if | y (k— 1 ) | < C, 


where 

f j(y(k— 1)) = b 0 + b t y(k— 1) + ... + b^ik-l)" 1 . 
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8.4.3 Exponential Models 

The basic characteristic of an exponential autoregressive 
model (Ozaki, 1985) is one or more parameters being negative 
exponential functions of the variable. For example, consider 
a second order exponential AR model: 

2 2 

y(k) = (ax-bje^ (k 1> )y(k-l) + (a 2 -b 2 e~ y <k_1> )y(k-2) + e(k) 

(8.4.6) 

= ajyfk-l) + a 2 y(k-2) + e(k), (say). (8.4.7) 

Thus parameter values of the exponential model are dependent 
on the magnitude of y(k-l); if y(k-l) is very small, 

a i “ a i”bi. and a 2 “ a 2 -b 2 , 

and if y(k-l) is very large, 

<*! “ a t and a 2 “ a 2 . 

Thus the exponential function within the parameter acts as a 
smooth threshold. 

The parameters a lt b t in (8.4.6) can be estimated by 
standard least squares estimation method as the model is 
linear-in-parameters. A suitable criterion may be used to 
determine the model order in the case of a generalized model: 

y(k) = fi(y(k-l))y(k-l) + f z (y(k-l))y(k-2) 

+ ... + f n (y(k-l))y(k-n) + e(k), 

where 

2 

f t (y(k-l)) = aj + b t e~ y (k_1> . 

An extended exponential model is represented by 

2 

y(k) = <ao + (b 0 + fyylk-l) +...+ b - y m (k-l))e' y (k_1) >y(k-l) + e(k), 

which is equivalent to the exponential AR model for m = 0. 
Remark 

The linear and nonlinear threshold models and the 
exponential AR models belong to the category of models with 
variable amplitude dependent parameters. The purpose of 
constraining the parameters is stabilization, that is to 
arrest explosive tendencies of polynomial models. 
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8.5 CONCLUSIONS 

There are many possibilities for modelling nonlinear 
processes, and there is a lot of flexibility as regards 
model structure selection and parameterization. However some 
of the basic problems are (a) there is a certain degree of 
heuristics associated with almost all modelling methods, and 
(b) the statistical procedures for the assessment of the 
convergence and the stability of the identified parameters 
and hence the quality of the model are not adequately 
developed. So proper identification can be difficult. Again, 
there can be different models for the same process, although 
no model can be said to be the best. As such, the model is 
not expected to be a reconstruction of the process, rather 
it is intended to serve as a set of operators on the 
identified set of inputs, producing similar output as 
expected from the process. The problem is that in real life 
the process output is usually contaminated with noise and 
other disturbances, whereas ideally the model output should 
f ollow the true output of the underlying representative 
process, which is unknown. So validity of a model should be 
examined carefully, for example through cross validation 
(i.e. validation against sets of representative data not 
used for modelling) or through prediction performance etc. 

A summary of some of the well-studied nonlinear models 
namely, the bilinear model, the threshold model, and the 
exponential model has been presented in this chapter. A 
large class of processes can be represented by these models. 
The different models are not universally applicable. It is 
important that proper choice of the appropriate model is 
made depending on the application concerned, for which 
unfortunately there is no definite guideline. 
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CHAPTER 9 


MODELLING OF NONLINEAR PROCESSES USING GMDH 


Nonlinear processes can be modelled using hierarchical 
stages of simple nonlinearity, where each building 
block is represented by a linear-in-the-parameter 
model. 


9.1 INTRODUCTION 

The identification and modelling of processes with nonlinea- 
rity can often be a difficult task mainly because of the 
following reasons: 

(a) unknown or partially known structures, 

(b) large dimensions with many variables having nonlinear 
interrelations, and 

(c) availability of limited operational data, etc. 

The traditional functional series approaches of Volterra and 
Wiener require prohibitively intensive computation even for 
low order representations; a large parameter set as well as 
long data sequences are required for estimation. The Group 
Method of Data Handling (GMDH) offers a powerful 
alternative, which has been successfully used in diverse 
areas (Farlow, 1984). 

GMDH, which was developed by Ivakhnenko (1970), has a 
multilayer hierarchical structure. The basic building block 
is a two-input one-output submodel, represented (typically) 
by a linear-in-the-parameter quadratic polynomial. A bank of 
submodels forms a layer, and a bank of layers forms the 
model. The submodel outputs of one layer undergo a selection 
process f or assessment of the richness of inf ormation 
contained, and the selected outputs are then forwarded as 
inputs to the following layer and so on. Thus a process with 
a high order of nonlinearity can be configured into a 
multilayer model comprising basic building blocks. 

A crude single stage approximation of GMDH is to 
consider a set of all possible variables and their nonlinear 
transforms which can be related with the output through a 
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linear-in-the-parameter model. A subset selection is 
performed on the input data set, and the selected data set 
is used for parameter estimation. 

The organization of this chapter is as follows. The GMDH 
architecture is introduced in Sec. 9. 2. The two key issues on 
which the success of GMDH largely depends are the selection 
of candidate inputs f or the layers and the selection of 
submodel structures, which are treated in Sec. 9.3. Sec. 9. 4 
f eatures one typical application of GMDH involving a 
multi-input single-output environmental process. Sec. 9. 5 
presents a single-layer nonlinear model which incorporates 
nonlinearization of input variables and identification using 
orthogonal transformation. All the models discussed in this 
chapter are linear-in-the-parameter models. 


9.2 THE GMDH ARCHITECTURE 

9.2.1 Multinomial Representation of Nonlinearity 

Consider a multi-input single-output nonlinear process. The 
output (y) can be, in general, expressed as a multinomial 
function (also called Kolmogorov-Gabor Polynomial ) connec- 
ting all possible inputs and their combinations: 

N N N N N N 

y = a 0 + £ a 1 r 1 + £ £ a^rj + £ £ £ a ljk r irj r k + ... 

1=1 l=lj=l 1=1 j=lk=l 

(9.2.1) 

where r t , rj etc. may represent known individual inputs, the 
square or higher powers of the same inputs; the time-lagged 
or past values of individual inputs or the output may also 
be treated as input to the model. a lf a t j, a t j k are the 
unknown parameters of the system. Equation (9.2.1) is the 
discrete analogue of the Volterra series (Schetzen, 1980). 
The conventional methods f or the estimation of the 
parameters of (9.2.1) will require large sets of data and 
prohibitively intensive computation. GMDH offers a 
simplified approach to this problem. 


9.2.2 Structural Layout of GMDH 

The GMDH model of (9.2.1) is shown in Fig.9.2.1. It consists 
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Pre- First First layer Final 



(a) GMDH structure 


y = ao + a^j + a 2 r 2 + a 3 rf + 

a 4 r l + a 5 r l r 2 

(b) The generic submodel description 

Figure 9.2.1 The schematic structural layout of GMDH 




r i , 

/s 
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(a) (b) 


Figure 9.2.2 Modelling approach for submodels in GMDH: 

(a) Parameter estimation scheme, 

(b) Pseudo-output (y) generation scheme. 


of a series of layers, each having a number of submodels. A 
submodel (Fig. 9. 2. 2) has one or two inputs and one output, 
and is modelled usually by a polynomial of maximum order 
two; for the parameter estimation of a submodel, the final 
output (y) is presumed to be the submodel output, and the 
estimated output is used as the submodel output f or 
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subsequent modelling operations. All sensible combinations 
of candidate inputs to a layer are accommodated as inputs to 
the submodels of the layer. Each layer is followed by a 
stage, called the layer selector. The function of the layer 
selector is to select those submodel outputs which contain 
maximum yet mutually independent information about the final 
output. These outputs, which are also called intermediate 
variables or pseudo-outputs, are passed on as inputs to the 
next layer. The outputs of subsequent stages are expected to 
bear stronger correlation with the final output. 

There is no constraint on the number of submodels in a 
layer or the number of layers in the overall model. Due to 
the generation of intermediate variables with increasingly 
richer information, the number of submodels naturally drops 
in subsequent layers. At the final layer, the output (y) 
may be expressed as a linear function of the inputs to that 
layer. 


9.3 GMDH: DESIGN AND VALIDATION OF MODELS 

The design of GMDH models mainly involves the f ollowing 

problems: 

(a) input selection for the first and subsequent layers, 

(b) structure selection, including determination of suita- 
ble variables f or the submodels and estimation of the 

parameters of the submodels in different layers, and 

(c) the layer termination, which is effected when 

modelling error cannot be reduced any f urther by 
additional layers. 

Input selection and layer selection 

Input selection or preselection ref ers to selection of the 
inputs to the first layer, whereas layer selection refers to 
the selection of inputs to second or any subsequent layer; 
the basic purpose is the same for the input selector and the 
layer selector. 

The set of candidate inputs x(k) to the first layer 
may include all the individual inputs (uj, u 2 etc.), the 
time delayed inputs (u^k-l), u 1 (k-2) etc.), and the time 
delayed outputs (y(k-l), y(k-2) etc.); in other words all 
variables which may be strongly or weakly correlated with 
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the output are considered: 

x(k) = {UiUd.Uiik-l),. ...u^k-nhuaUc) u 2 (k-n),..., 

u N (k-l), ...,u H (k-n),y(k-l),y(k-2),. ..,y(k-n)h 

(9.3.1) 

In the selection of the input variables, their relevance 
with respect to the output should be given due 
consideration. If necessary, instead of the actual values, 
nonlinearly transformed values (Sec. 8.2.3) of the input 
variables may be used. Those variables which are relatively 
strongly correlated with the output are selected. 

For layer selection the indices of submodel structure 
validation like the cumulative mean square error (MSE) 
cross validation for each submodel may be considered. 
Submodel outputs with relatively low values of cumulative 
MSE only are selected and the rest are dropped. 

Collinearity between the candidate inputs both at the 

preselector as well as at the layer selector is undesirable. 
The chances of collinearity is particularly high at higher 
layers of GMDH. The collinearity can be detected using SVD, 
and to eliminate collinearity subset selection using QRcp 
factorization (discussed in Sec.3.6.2) may be used. 

Once distinct collinearity among the candidate set of 
variables is eliminated, the significant set of variables 
may be subsequently chosen successively using the modified 
QRcp factorization as discussed in Sec. 3. 6. 4. 

Structure selection and validation 

Structure selection and validation are closely related 

concepts, which in the present context relate to the 

appropriate modelling of the submodels. This subject is 
discussed in Sec. 3. 6; a brief outline follows. 

The objective is to determine the model order and 

parameter values for each submodel, when a set of input and 
output data are available. It is understood that the models 
are linear in the parameters. When the model is under- 
parameterized, that is the model order is too low, the 
input-output data will show high model-misfit with the 
estimated parameters. When the model is overparameterized, 
the model misfit may be low but overfitting of the data will 
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result; there will be a tendency to model both the 
information and the noise in the data, and the validity of 
the model for sets of data not used for modelling will be 
poor. The two common approaches for structure selection and 
validation are 

(i) to use Cp statistic (Sec.3.6.4), 

(ii) cross validation (Sec.3.6.5). 

Summary of GMDH 

GMDH is performed through the following steps: 

(1) The set of all the available inputs, x in (9.3.1), is 

formed. Preselection is performed using correlation 
analysis between each variable and the final output, or 
through subset selection. 

(2) The modelling of the submodels is performed. 

(3) The layer selection is performed using the information 

criterion index or the cross validation index. 

(4) Steps (2) and (3) are repeated sequentially until an 

additional layer does not improve the inf ormation 
criterion index or the cross validation index. 

(5) The final output is expressed as a linear function of 

the outputs of the last stage. 


9.4 MODELLING THE COD PROCESS IN OSAKA BAY 


The chemical oxygen demand (COD) can be considered to be an 
index of water pollution in the sea. COD concentration is 
monitored at a number of stations in the Osaka bay along 
with water temperature, transparency and dissolved oxygen 
concentration. Altogether 84 sets of monthly data are 
available (see Appendix 9), for the period 1976 to 1983; 54 
sets of data are used as the training set and 15 sets of 
data are used as the testing set. The prediction performance 
of the model is tested on the last 15 sets of data. 


Inputs and outputs 

No. of inputs: 3 

Input variables: x t 

*2 

X 3 

Output variable: y 


water temperature 
water transperancy 
dissolved oxygen concentration 
COD concentration 
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Preselection 

Based on the process information, all the input variables 
are considered as inputs to the first layer, bypassing the 
stage of preselection. The submodels of the first layer are 
modelled as follows. 

First layer 

The submodels of the first layer are modelled as 

u t = 5.1286 + 0.0162X! - 1.0918x 2 + O.OO 6 OX 1 + 0.2864X 

- 0.0688x 1 x 2 , 

u 2 =-0.3446 - 1 . 754 lx 2 + 2.1154x 3 + 0.4878x| - 0.1018X 

- 0.2533x 2 x 3 , 

u 3 = 6.8907 - 0 . 4275xj - 0.5813x 3 - 0.0097Xj + 0.0292x 
+ 0.0290x 1 x 3 , 

where u lt u 2 , and u 3 are the pseudo-outputs of the first 
layer. 

Layer selection 

The mean square error for the pseudo-outputs of the first 
layer on the validation data set, is given in the second 
column of Table 9.4.1. The MSE for the pseudo-outputs u 1( 
u 2 , and u 3 are found to be close, and further rank tests 
reveal these variables not to be collinear. Hence all these 
pseudo-outputs of the first layer are considered as inputs 
to the next layer. 

Second layer 

The submodels of the second layer are modelled as follows. 

v< =-0.2737 + 1.4732U, - 0.5016u, - 0.6828uf- 0.5256u! 
+ 1.2510u 1 u 2 , 

v 2 =-1.0212 + 0 . 3353u 2 + 0.9661u 3 - 0.2140u 2 - 0.2358u 3 
+ 0.4507u 2 u 3 , 

v 3 =-1.6663 + 0.1894U, + 1.4523u a - 0.2955u? - 0.4513u 3 
+ 0. 7057 Ul u 3 , 

where v lt v 2 , and v 3 are the pseudo-outputs of the second 
layer. 


N (M N CO N CO 
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Layer selection and layer termination 

The MSE on the validation data set, as given in the second 
column of Table 9.4.1, shows v 2 producing minimum error 
among the second layer outputs. The modelling procedure is 
extended to the third layer, where the estimates obtained 
are as follows. 

Wj =-0.6435 + 0.0328V! + 1.2295v z - 0.0378Vj - 0.1 635 v 2 

- 0.1794viv 2 , 

w 2 =-0.2281 + 1 . 7239 v 2 - 0.6083v 3 - 1.1046v 2 - 0.7283v 3 
+ 1.8224 v 2 v 3 , 

w 3 = 0.2557 + 2.5392Vi - 1.6596v 3 - 0.2919vJ + 0.321 lv 3 

- 0.0184V!V 3 , 

where w t , w 2 , and w 3 are the pseudo-outputs of the prospec- 
tive third layer; the corresponding MSE are shown in the 
second column of Table 9.4.1, which confirms that v 2 leads 
to the lower error model, compared to the third layer 
outputs. So the model is terminated at the second layer. 

Final model and the result 

Using all of the available (the first 69) data sets the 
parameters of the submodels concerned are reestimated as 
follows. 

Ui = 5.3744 - 0.1998Xi - 0.5577x 2 + 0.0099xi + 0.1727x| 

- 0.0336xiX 2 , 

u 2 = 0.5658 - 1 . 2108x 2 + 1.5735x 3 + 0.4478x 2 - 0.0605x 3 
" 0.2778X2X3, 

v 2 =-0.3130 + 0. 1275u t + 0.8515u 2 - 0.1716^ - 0.1958u| 
+ 0. 4029U!U 2 . 

The validation is assessed by computing the MSE f or the 
estimated model on the last 15 sets of data (which are not 
used for modelling in any way). The results for all the 
submodels are presented in the third column of Table 9.4.1; 
once again v 2 is found to produce the minimum MSE (=1.0006) 
on the validation data set (i.e. 70th to 84th data set). 

Remarks: 

(1) The modelling and the validation data sets could be 
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Table 9.4.1 COD process model: MSE for GMDH-submodels 


Pseudo- 

outputs 

Model fit 
on first 54 
data sets 

Validation fit 
on 55th to 

69th data sets 

Validation fit 
on 70th to 

84th data sets 

u i 

0.9749 

4.7535 

2.4659 

u 2 

1.1908 

2.0994 

1.1236 

u 3 

0.8902 

1.7354 

1.4076 

v i 

0.7973 

20.6531 

1.3732 

v 2 

0.7299 

1.9124 

1.0006 

v 3 

0.6890 

2.7979 

1.6492 

W 1 

0.7099 

10.5616 

1.0789 

w 2 

0.6715 

2.8795 

1.0904 

w 3 

0.6633 

474.6286 

1.3800 


selected randomly over the available data sets, since no 
historic (i.e. time delayed) informations are used here. 

(2) In the present case, the individual submodels have not 
been optimised. Cp statistic (discussed in Sec.3.6.4) 
coupled with QR factorization based subset selection may be 
used for this purpose. 



Figure 9.4.1 Estimation of the Chemical Oxygen Demand 
(COD) in the Osaka bay using the two layer GMDH model. 
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9.5 A SINGLE LAYER NONLINEAR MODEL BASED ON 
ORTHOGONAL TRANSFORMATION 

The two basic features of GMDH sire nonlinearization and 
multilayer identification of linear-in-the parameter models. 
This section discusses a simplified single-stage method, 
where the linear as well as selected set of nonlinearly 
transformed variables are considered. The objective is to 
produce the best model in an information criterion sense. 
The method is presented as an illustrative example. 

Example 9.5.1 Modelling the yearly averaged sunspot series 

A method for the best subset AR modelling is presented in 
Sec.3.6.3. Here a linear-in-the-parameter nonlinear model is 
considered. 

For the sunspot series, the best subset AR model based 
on 221 data points over the years 1700 to 1920 given by 
(3.6.4) (see Example 3.6.3(D) has three independent 
variables y(k-l), y(k-2) and y(k-9), while the candidate set 
of variables were <y(k-l), y(k-2), y(k-3),..., y(k-9)>. Here 
in addition to these nine variables, the candidate set is 
assumed to comprise all the quadratic terms involving 
y(k-l), y(k-2), y(k-9). Thus the candidate set becomes 

(y(k-l), y(k-2), y(k-3),..., y(k-9), 
y 2 (k-l), y Z (k-2), y Z (k-9), and 
y(k-l)y(k-2), y(k-2)y(k-9), y(k-l)y(k-9)>. 

The same procedure as detailed in Sec.3.6.3 is followed, 
which involves the use of subset selection and consideration 
of the inf ormation criteria AIC and SIC. The best model is 
obtained as 

y(k) = 1.3162y(k-l) - 0.4406y(k-2) - 0.1947y(k-3) 

+ 0.1339y(k-8) + 0.0105y Z (k-2) - 0.0130y(k-l)y(k-2), 

with AIC = 5.0656, and SIC = 5.1760. 

Here the best model is obtained through an exhaustive 
search on the selected subset with pseudorank 12, for which 
both AIC and SIC work out to be minimum. 
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Remark 

It may be noted that subset selection can take care of the 
closeness to coilinearity or the linear dependence among 
the input variables (or their nonlinear ized variants) but 
their degree of correlation with the output (either singly 
or jointly) is not taken into account. On the other hand, 
the information criterion, cannot detect coilinearity bet- 
ween the input variables (or regressors) but does take into 
account the correlation between the input(s) and the output. 


9.6 CONCLUSIONS 

GMDH is a powerful method for modelling nonlinear time 
series or input-output processes with limited data. 

The main strength of GMDH is in the breaking up of a 
complicated problem of identification of a process with 
multiple inputs and single output into a number of simpler 
problems of identification of submodels with one or two 
inputs and one output. The number of parameters of a 
submodel being only a few (typically five), only a small 
data set is required for the identification of the complex 
process. The varied implementations of GMDH reported in the 
literature underline the application potential of this 
method. 

It is expected that the use of orthogonal trans- 
formation and subset selection in the implementation of GMDH 
can lead to reduction in the heuristics and redundancy in 
the model structure, although more work needs to be done in 
this direction. 
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CHAPTER 10 


MODELLING AND PREDICTION OF NONLINEAR PROCESSES 
USING NEURAL NETWORKS 


Nonlinear series with or without periodicity as well 
as nonlinear input-output processes can be modelled 
using neural networks. 


10.1 INTRODUCTION 

Neural networks offer some of the most versatile ways of 
modelling nonlinear processes of a diverse nature. A neural 
network attempts to mimic the functioning of the brain in a 
crude but simplistic manner. The nonlinear relationship 
between the input(s) and the output(s) is modelled using a 
number of basic blocks, called neurons or nodes. The nodes 
are interconnected and are usually arranged in multiple 
layers. Each internodal link or Interconnection is weighted. 
At each node, the weighted inputs (from other nodes or from 
external inputs to the network) are summed together with an 
external bias known as the threshold, and the result is 
passed through a nonlinear function (also known as the 
activation function), which forms the output of the node. 
The nonlinearity associated with each node remains fixed. 

The weights on the interconnections are estimated 
iteratively by a nonlinear optimization method using known 
sets of input and output data; such adaptation of the 
network is ref erred to as the training or the learning of 
the network. The underlying idea is the biological nervous 
system-like performance of the network in learning complex 
processes. The main characteristic features are the network 
architecture, the nonlinearity associated with the nodes and 
the training method used. The designs are not unique, and 
there may be heuristics associated with the specification of 
the network structure. 

Various configurations are possible for the neural 
networks for different applications. The present study is 
confined to the feedforward network, which is one of the 
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most popular architectures. Modelling and prediction of 
nearly periodic and quasiperiodic time series as well as 
multi-input single-output processes are studied. 

There is little difference between the configuring and 
training of a feedforward network and the conventional 
identification problem in the case of an input-output 
process. So it is desirable that the network is 
parsimoniously designed. The optimization of the size of the 
network (in terms of the optimum number of nodes and links) 
using singular value decomposition (SVD) and subset 
selection has been explored. Two methods used are: 

(i) optimization through the selection of optimal set of 
time-domain inputs, and the optimal set of links and nodes 
within the neural network, 

(ii) optimization through orthogonalization of the data 
prior to use in the neural network in the case of series 
with periodicity. 

The organization of this chapter is as f ollows. The 
basic features of neural networks are presented in Sec. 10. 2. 
The multilayer perceptron structure is introduced in Sec. 
10.3, and the backpropagation algorithm for the adaptation 
of the weights on the links and the thresholds is discussed. 
The design of feedforward neural networks of optimal size is 
treated in Sec. 10. 4; the modelling of both time series as 
well as complex input-output processes is considered. 
Section 10.5 is devoted to the study of neural networks 
operating with transformed input output data, modelling 
nearly periodic or quasiperiodic processes. Finally, 
Sec. 10. 6 introduces an SVD based method for the convergence 
assessment during the training of the neural network, which 
can be used as an alternative to the conventional approach 
based on the output error. 


10.2 BASICS OF NEURAL NETWORKS 
A node 

A node, which is the basic component of a neural network, is 
designed to mimic the understanding of the functionality of 
a neuron in the human brain. The inputs to a node 
(Fig.10.2.1) are the available measurements or the outputs 
from other nodes. Each input is treated as a connection or a 
link with which a weight is associated. Each node (Y) is 
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Signum Ramp with Sigmoid 

function saturation function 


(b) 

Figure 10.2.1 (a) The structure of a node, and 

(b) typical activation functions. 


characterized by a nonlinear function (f (. )) and an additive 
threshold value (F y ). The node sums the weighted inputs and 
the threshold value, and passes the result through its 
characteristic nonlinearity to produce the output. The 
threshold is used as an offset. 

The three common types of nodal nonlinearities are as 
f ollows: 


(a) 

(b) 


Signum function 


f (x) = 


/ 1. if 
\ -1. if 


x a 0, 
x < 0. 


Ramp with saturation 


( 10 . 2 . 1 ) 


1, if x a 1, 
x, if | x | < 1, 
-1, if x s 1. 


f (x) * 


( 10 . 2 . 2 ) 
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(c) Sigmoid function 


f (x) = l 


1-e 


1+e 

1 


1+e 


for -1 s f(x) s 1, 

•for 0 s f(x) s 1. 


(10.2.3) 


The sigmoid function is a differentiable nonlinearity and 
hence is suitable for continously varying signals. 

Neural net structure 


(a) Input, output and hidden nodes 

Inputs to the input nodes and outputs from the output nodes 
are directly accessible from the external environment. The 
external inputs to the network are usually not weighted; 
all interconnections within the network are weighted. The 
signals sent out from output nodes can be directly read or 
measured. The hidden nodes are not directly accessible from 
the external environment; all input and output connections 
of these nodes are with nodes within the network only. 

In usual configuration, the input nodes constitute the 
input layer, and the output nodes constitute the output 
layer of the network. The hidden nodes may belong to one or 
more hidden layers within the network, which are not 
directly accessible by the inputs or the outputs. 

(b) Architectures 

Two basic architectures for neural net are the feedforward 
and the feedback architectures. Networks may also be 
designed combining the features of both. 

Feed / orward architecture 

A feedforward neural network, fias a multilayered structure. 
The signals flow between the nodes only in the f orward 
direction, i.e. towards the output end (Fig. 10.2.2); nodes 
of a layer can have inputs from nodes of any of the earlier 
layers. The weights on the nodal interconnections as well as 
the thresholds are adaptively adjusted to optimize the 
performance of the network. The popularly used multilayer 
perceptron has a feedforward architecture. 
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Figure 10.2.2 A three layer perceptron or feedforward 
network, with schematic arrangement for learning 
through backpropagation. 


Feedback architecture 

In the feedback architecture, the output from a node can 
flow in the forward direction (i.e. to nodes towards the 
output), or in the reverse direction, or may be fedback as 
input to the same node itself (see Fig.10.2.3). Such 
networks are also called recurrent networks. 

Cc) Network adaptation or learning 

In neural networks, usually the nodal characteristics remain 
unchanged. The adjustment of the weights on the nodal 
interconnections and the thresholds is usually ref erred to 
as training or learning of the network; this is analogous to 
the estimation of parameters in an identification problem. 
The learning may be supervised or unsupervised. Unsupervised 
learning is based on maximization of some predefined 




10.2 Basics of Neural Networks 279 



Figure 10.2.3 A two-layer feedback neural network. 

function or criterion. Supervised learning expects operator 
intervention. It requires a training data set, comprising a 
set of input data and a corresponding set of data for the 
desired outputs; the learning is based on the minimization 
of the error between the computed and the desired outputs. 

The basic error correction rule concerns adjustment of 
weights on the interconnections in proportion to the error 
between the computed outputs and the desired values of each 
node in the output layer. The gradient descent rule refers 
to adjustments of the weights such that the cumulative mean 
square error over the training set is minimized. The 
generalized delta rule or the backpropagation ( learning ) 
algorithm is a multilayer learning algorithm based on the 
gradient descent approach, when the nodal nonlinearity is 
differentiable like the sigmoidal nonlinearity. 

Remarks 

(a) The neural net approach presumes knowledge as being 
built into the nodal interconnections rather than into the 
outputs from nodes, which may or may not be observable. 

(b) The interconnections ascribe robustness to the neural 
architecture which will work even if some nodes fail. 

(c) The ability of a multilayer network in mapping nonli- 
near relationships comes f rom the nonlinearities within the 
nodes. If the nodes had no nonlinearity, there would always 
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be a single layer network which could be functionally 
equivalent to a multilayer network. 


10.3 MULTILAYER PERCEPTRON AND BACKPROPAGATION 
ALGORITHM 

A multilayer perceptron is a multilayer feedforward neural 
network having one input layer, one or more hidden layers 
and one output layer. 

It is found that a three-layer perceptron with the 
backpropagation learning algorithm can model a wide range of 
nonlinear relationships to a reasonable degree of accuracy. 
The ideal structure of the hidden layer and the inter- 
connections are still a subject of research; some results 
related to the optimization of the size of the feedforward 
networks are presented in Sec. 10.4. 

The generalized delta rule (GDR) or the backpropagation 
algorithm is due to Rumelhart, Hinton and Williams (1986) 
and is summarized below; detailed derivation follows in 
Appendix 10. 


10.3.1 Backpropagation Learning 

Consider a three layer perceptron (Fig. 10. 2. 2) with 

(a) L nodes in the input layer X, (h = 1 to L), 

(b) M nodes in the hidden layer Y, (i = 1 to M), 

(c) N nodes in the output layer Z, (j = 1 to N). 

Refer to the desired or test outputs of the network as yj, 
and the output of say the i-th node of layer Y as 0 Y1 . The 
objective is to train the algorithm through adaptation of 

(i) the weights W t j and V hl on nodal interconnections 
between Y and Z layers and between X and Y layers 
respectively, and 

(ii) the threshold values F 2 j and F yi for the nodes of Z and 
Y layers respectively. 

The backpropagation learning proceeds as follows: 

(1) Initialization: 

Assign random values (say, between 0 and 1) to all the 
weights Wjj and V hl and the threshold levels F 2 j and F yi . 
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(2) Output computation: 


Read a 

new 

set of input data to the network, 

0 xh , and 

compute 


' \ 


(i) 0 Y1 

= f 

E V hl°Xh + F Y1 » 

(10.3.1) 


lh=l ) 


(ii) 0 z j 

= f 

ZWljO Yl + F zJ J, 

(10.3.2) 


where /(.) is sigmoidal nonlinearity (10.2.3) between (0,1). 

(3) Adaptation of weights: 

Read a new set of data for the desired outputs y Jp and 
compute 

(i) AW tJ = aO Y1 D zJ , 0<a<l, (10.3.3) 

where 

D zJ = (y j~0 2 j )O z j (1-0 Z j ) , 

and 

AWjj = W t j(new value) - j(last value). 

(ii) AV hl = (S0 Xh D Y i, 0<3<1, (10.3.4) 

where 

d y1 = ( j | i W 1J D ZJ ]o Y1 d-o Y1 ), 

(4) Adaptation of thresholds: 

AF zJ = aD zJ , 

AF Y j = 3D Y1 . 

(5) Iteration: 

Repeat by going to Step (2) and iterate to desired 
convergence of the computed outputs 0 zJ to the test outputs 
yj, which completes the neural network model. 

Remarks 

(a) The positive constants a and 3 determine the learning 
rates which are heuristically chosen to be less than 1. In 
the examples of this chapter, the values used are a=3=0.4. 

(b) For improved convergence of the algorithm and smoother 
weight changes, often an additional momentum term is used in 
(10.3.3) and (10.3.4), which basically acts like a low-pass 
filter (discussed in Appendix 14A). For example, at k-th 
iteration, 
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(l-y)aO yi D zl 

A WjjOt) = , 0<y<l. 

(1-yq" 1 ) 

10.3.2 Application Example 

The modelling of a nearly periodic process is discussed 
here. More complex examples are given in Secs. 10. 4 and 10.5. 

A three layer neural network is used. Each of the 
hidden and the output nodes comprises a summer with 
threshold followed by sigmoidal nonlinearity (between 0-1). 
The network is trained using the backpropagation algorithm. 
The inputs to the network and the reference output are 
normalized to lie within 0.3 and 0.7 (that is for the data 
to remain approximately within the linear region of the 
sigmoidal nonlinearity). 

Example 10.3.2 Modelling and prediction of Trans-Atlantic 

Airline passenger series 

This series contains monthly air-traffic data over 12 years 
(see Appendix 7A.2); the data show yearly periodicity. 

There can be different ways of modelling the airline 
traffic series, depending on the objective. If the model has 
to produce 12 step ahead prediction, a typical representa- 
tion can be 

y(k) = /(y(k-12), y(k-24)). 



Figure 10.3.1 A 2-11-1 feedforward network modelling 

the Trans-Atlantic Airline traffic series. 

o, a node with nonlinearity; •, a unity gain node. 
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( b ) 



Figure 10.3.2 (a) Training of the Air traffic series, 
(b) One period ahead prediction for 10th to 12th years. 


This process can be modelled by a three layer neural network 
(Fig. 10.3.1) with 2 input nodes and 1 output node with 

adequate number of hidden layer nodes (as discussed later). 
Let the data for the 7th and 8th year be used as input and 

the data for the 9th year be used as the output. The input 

data constitute a 2x12 matrix, and the output data are a 12 
element vector. Here the network has to learn 12 patterns, 

so 11 hidden nodes are used. The internodal links and the 
thresholds are initialized with random values between -1 and 
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1. The network is trained over 30,000 epochs (an epoch being 
a training event corresponding to a single pass over the 
entire training set). The trained network is used to predict 
the air-traffic for the 10th year. 

Again the network is trained with the data for the 
8th, 9th and 10th years to predict the same for the 11th 
year. Similarly the traffic for the 12th year is also 
predicted. The training results over the periods 8th to 11th 
are shown in Fig.l0.3.2(a), and the prediction results over 
10th to 12th periods are shown in Fig.l0.3.2(b). 

Remarks 

The number of nodes required in the hidden layer depends on 
the inputs of the network as well as the number of patterns 
to be learnt (which is the same as the epoch length in the 
present context). For N number of input nodes usually 2N+1 
number of hidden nodes are used; again for learning M 
different patterns, maximum M-l number of hidden nodes are 
required, although the optimum number of hidden layer nodes 
required can be much less as discussed in the following two 
sections. 


10.4 DESIGN OF OPTIMUM NETWORKS USING SVD AND 
SUBSET SELECTION 

In any method of modelling, overparameterization or redun- 
dancy in the structure is undesirable. A neural network will 
be overparameterized if the number of links is excessive. 
In such cases, if the training set of data are not noise- 
free, the network will tend to learn the information along 
with noise in the data leading to poor validation results. 

For an optimum design, the neural network should have 
the optimum number of inputs, and the optimum number of 
links and nodes within the network. This section addresses 
two basic questions, .namely (i) which of the candidate 
inputs to the network constitute the optimal set of inputs, 
and (ii) at the hidden layer(s), which nodes and links 
between the post-hidden layer stage and the subsequent 
stage(s) are essential for the design of the network with 
optimum size. The singular value decomposition (SVD) and 
subset selection (Sec.3.6.2) based on QR with column 
pivoting (QRcp) factorization are used for the design of 
feedforward networks of optimum size. 
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10.4.1 Determination of Optimum Architecture 
Input nodes 

The basic problem is to determine the optimum number of 
inputs that should be considered. This involves subset 
selection from the data sets of all candidate inputs for the 
selection of the input variables which contain significant 
information. SVD followed by QRcp factorization can be used 
for subset selection as discussed in Sec. 3. 6. 2. 

Optimum number of hidden nodes and links 

At any post hidden layer stage, the links that are connected 
with any subsequent layer act as inputs to those layers. It 
is desirable to eliminate those links which are redundant, 
or which carry relatively insignificant information. 

Two possible network architectures are considered: 

(a) Homogeneous networks: Here all the input nodes are 
connected with all the hidden nodes only (e.g., see Fig. 
10.4.1a). This is the most popular architecture. 

(b) Nonhomo geneous networks: This covers all the designs 
excluding the homogeneous networks. In these architectures, 
all possible combinations of inputs are fed to the hidden 
layer nodes; the direct links bypassing the hidden layer 
nodes are also permitted (e.g., Fig.l0.4.2a). 

In the first case SVD can be used to determine how many 
hidden layer nodes should be used; in the second case subset 
selection can be performed to determine which links between 
the hidden layer stage and the subsequent stage can be eli- 
minated without any appreciable degradation of performance. 

The problem of modelling a multi-input single-output 
process using a 3-layer neural network is detailed below. 

Design procedure 

Let there be n candidate inputs (i.e. the input vector is 
n-dimensional) and one output; suppose there are m sets of 
input and output data available. 

(1) First a candidate network is considered, which may be 
exhaustive or overparameterized but not underparameterized. 
This network is iterated to crude convergence (explained 
later). 

(2) Suppose the number of links between the hidden layer 
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and the subsequent layer (i.e. the number of links to the 
output layer in the present case) is r; the r links are 
referred to as the pseudo-outputs of the hidden layer. Here 
r includes the number of hidden layer nodes as well as the 
number of direct links bypassing the hidden layer (that is 
the direct links between the earlier layer(s) and layer(s) 
following the concerned hidden layer). For each of the m 
input data sets to the network, compute the r pseudo-outputs 
of the hidden layer; am mxr matrix B is formed 
corresponding to the m sets of input data. 

(3) SVD is performed on B: B = U B S B Vg. The number of domi- 
nant singular values in S B (say g, gsr), indicating the rank 
or appoximate-rank of B, will indicate the number of links 
that should be retained. 

Remark : For homogeneous networks, the selection stops 
here. For nonhomogeneous networks, the selection procedure 
continues as follows. 

(4) QRcp factorization is performed on gxr matrix V B for 
subset selection, and the specific g of r links between the 
hidden layer and _ the subsequent layer to be retained are 
identified; here V B contains the first g columns of V B . 

The reduced-size network is reinitialized and retrained to 
the desired convergence. 

Justification 

The objective is to eliminate collinearity or near- 
collinearity between the different variables (or links 
carrying information), which is done using the numerically 
robust approach of SVD and QRcp factorization. The selection 
can be unique if the distribution of the singular values 
shows a large drop (as explained in Sec.3.6.2). Following 
(7.6.6), the squared sum of the eliminated singular values 
quantifies the part of the available energy in the data, 
which is rejected in the subset selection procedure. 

Remarks 

(1) The state of ‘crude convergence’ mentioned in step (i) 
above is not uniquely defined. The selection is expected to 
be meaningful, when the distribution of the singular values 
shows a relatively large drop. According to experimental 
verifications, the selection may be possible quite early in 
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the training. 

(2) If there are more than one hidden layers, the selection 
procedure starts with the first hidden layer, and the whole 
exercise is repeated sequentially f or each subsequent hidden 
layer towards the output. 

(3) If only one set of input data is available for the 
training of the network (i.e. m=l), B at step (2) above can 
be formed from the outputs at the post hidden layer stage 
for m 01) consecutive iterations. Step (3) and step (4) can 
follow subsequently the same way as above. 

(4) There are alternative approaches for the reduction of 
the size of the network. For example, Karhunen -Lo6ve trans- 
formation (KLT) and principal component analysis based 
methods have been proposed; both these are eigenvalue based, 
which are numerically less robust than the singular value 
based methods. 


10.4.2 Modelling and Prediction of Mackey-Class Series 

The Mackey- Glass equation (Mackey and Glass, 1977), which 
models the nonlinear oscillations occurring in physiological 
processes, has been discussed in Sec. 8. 3.1 and Example 
8.3.2. Consider a discrete-time representation of the Mackey- 
Glass (MG) equation given by 

x(k+l) - x(k) = 9 2 x(k-r) _ o.lx(k), (10.4.1) 

1 + x 10 (k-x) 

r = 17, 

which generates a quasiperiodic series as shown in Example 
8.3.2. The objective is to model the MG series and to 
produce multistep ahead predictions. 

Here the series (x(k)) can be expressed as 

x(k+p) = f(x(k), x(k-x), x(k-2x), ..., x(k-(N-l)x)), (10.4.2) 

where p is the prediction time which can be chosen depending 
on short-term or long-term prediction; it has been shown 
(Lapedes and Farber, 1987) that N can be typically between 4 
and 8. In the present case N=6 is chosen; another reason for 
this choice is that the series (10.4.1) has a pseudo period 
length varying between 95-102, which is almost covered by 
the input data sets in (10.4.2). 

Three-layer f eedf orward networks are used with 
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sigmoidal nonlinearity (between 0 and 1) and the backpropa- 
gation algorithm is used for training. A six input network 
is considered where x(k), x(k-x),..., x(k-5x), with x = 17, 
are used as the inputs and x(k+p) is used as the output. For 
all exercises 300x6 data set is used for training, and the 
subsequent 200x6 data set is used for validation test 
through 6-step ahead prediction with p=6. The input data set 
may be represented by the matrix 


x(k-6) x(k-23) x(k-40) x(k-57) 
x(k-7) x(k-24) x(k-41) x(k-58) 
x(k-8) x(k-25) x(k-42) x(k-59) 


x(k-74) x(k-91) 
x(k-75) x(k-92) 
x(k-76) x(k-93) : 


the corresponding 300x1 output data set is given by 
y = [x(k) x(k-l) x(k-2) ... ] T . 

The modelling exercises used follow. 


Exercise 1 Modelling of the Mackey - Glass series with a 
homogeneous neural network 

(a) Design and selection: 

A network with 6 inputs, 11 hidden nodes and 1 output is 
considered (which is referred to as a 6-11-1 network) (see 
Fig.10. 4.1(a)). Throughout the training, SVD is performed on 
the 99x11 matrix B, which is a subset of the available 
300x11 matrix at the post hidden layer stage to determine 
the optimum number of hidden nodes (the size of B is not a 
limitation). The distribution of the singular values is 
shown in Fig. 10.4.2; these results lead to the deduction 
that 3 to 4 singular values are relatively dominant. So a 
reduced 6-3-1 network (Fig.l0.4.1b) is considered. Further 
subset selection is not necessary since inputs to the nodes 
are similar. 

(b) Training and validation: 

Both 6-11-1 and the reduced 6-3-1 networks are trained to 
convergence and the validation is tested. The results 
(Figs. 10.4.3, 10.4.4, and Table 10.4.3) show that the 

convergence rate and the performance of the reduced network 
is comparably good. In Fig.10.4.3 both fit for the training 
data set and 6-step ahead prediction results are presented. 
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(c) (d) 


Figure 10.4.1 Neural networks modelling the MG series: 
(a) 6-11-1 network, and (b) the reduced 6-3-1 network; 
(c) 4-19-1 network, and (d) the reduced 4-6-1 network, 
o, a node with nonlinearity; •, a unity gain node. 


Exercise 2 Modelling of the Mackey -Glass series with an 
optimal network 

(a) Input selection: 

Subset selection is perf ormed on dif f erent blocks of data 
from the 300x6 input data set and the optimal set of 4 
inputs in (10.4.2) are determined (Table 10.4.1). 

(b) Design and selection: 

An exhaustive 4-19-1 network is considered as shown in Fig. 
10.4.1(c). The network is trained with the selected 300x4 
data set. During the training, corresponding to 90 input 
data sets (out of 300), the respective magnitudes of the 
variables at the links at the post hidden layer stage are 
used to form the 90x19 matrix B. At some selected points 
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Figure 10.4.2 Singular value distribution of B at post 
hidden layer stage during the training of 6-11-1 network. 


subset selection is performed on B. The results (Table 
10.4.2) show that out of the 19 links at the 
post-hidden-layer stage, only 6 specific links carry the 
major part of the information, leading to the reduced 4-6-1 
network shown in Fig. 10. 4. 1(d). 

(c) Training and validation: 

Both the 4-19-1 and the reduced 4-6-1 network are trained to 
convergence and the validation is tested; the results 
(Figs.10.4.5, and Table 10.4.3) show that the performance of 
the reduced network is equally good. 
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Figure 10.4.3 NMSE during training of MG series. 
6-11-1 network, 6-3-1 network. 



Figure 10.4.4 MG series through 6-11-1 and 6-3-1 networks. 
original data, 6-11-1, 6-3-1 networks. 



Figure 10.4.5 MG series through 4-19-1 and 4-6-1 networks. 
original data, 4-19-1, 4-6-1 networks. 
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TablelO.4.1 Selection of optimal input set modelling the 
MG series (out of 6 candidate inputs) 


Data blocks 

Singular values 

Selected 
input variables 

(1 to 90) 

12.8, 3.4, 3.2, 1.3 

x(k-2x), x(k-5r), 

x 6 

0.880, 0.701 

x(k), x(k-4x) 

(200 to 290) 

12.8, 3.4, 3.1, 1.3 

x(k-2x), x(k-5x), 

x 6 

0.864, 0.805 

x(k), x(k-4x) 


Table 10.4.2 Selection of optimal links at hidden layer of 
4-19-1 network (Fig.l0.4.1(c)) 


Epochs 

Singular Values of 

Selected 


90x19 matrix B 

1 inks 

10 

28.1, 1.3, 0.9, 0.7, 0.6, 
0.2, 0. 115 0.003 

16, 18,17,19,15,2 

1000 

22.2, 3.3, 2.6, 1.3, 0.6, 
0.4, 0. 126, . . .0.002 

15, 18,16,17,19,2 


Table 10.4.3 Normalised RMSE M MSE/var iance ) ) for 6-step 
ahead prediction over the data set: 301-500 

Network: 6-11-1 6-3-1 4-19-1 4-6-1 

NRMSE 0.137 0.092 0.139 0.152 


10.4.3 Modelling of Chemical Oxygen Demand (COD) in 
Osaka Bay 

The Mackey -Glass series modelled in Sec. 10. 4.3, was a case 
where the data were noise-free. Here is an example of a 
real-life input-output process with noisy data. 

The COD process in the Osaka Bay (Appendix 9) is a 
three input, one output process. In Sec. 9. 4, this process 
has been modelled using GMDH; here neural network models are 
considered. Out of the available data, the first 60 sets are 
used for modelling the network and the next 23 sets are used 
for validation tests. 
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Figure 10.4.6 Nonhomogeneous neural networks modelling 
the COD process: 

(a) 3-12-1 network, and (b) the optimised 3-4-1 network. 


Exercise 1 Modelling of the COD process using a homogeneous 
neural network 

A homogeneous network with 3 input nodes, 6 hidden layer 
nodes and 1 output node is considered (the network is 
structurally similar to Fig.l0.4.1(a)). The training is 
performed for 60 input data sets which are fed to the 
network sequentially. The matrix B formed from the data at 
the post hidden layer stage during the course of training is 
SV-decomposed. The distribution of the singular values 
(presented in Table 10.4.4) shows only four singular values 
being relatively dominant; so 4 hidden nodes should be 
adequate for modelling. 

Both the 3-6-1 network and the reduced 3-4-1 networks 
are trained to convergence, and the validation is tested. 
The output estimation error f or the input data over the 
validation period are presented in Table 10.4.6. 


Exercise 2 Modelling the COD process with an optimised 
structure. 

An 3-12-1 network with exhaustive choice of connections 
between the input and output layers is considered 
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months 

Figure 10.4.7 Modelling and validation of COD (output) 
data 3-12-1 network, .... 3-4-1 network. 


(Fig.l0.4.6(a)). At different stages of training SVD 
followed by subset selection is performed on matrix B at the 
post hidden-layer stage. It is found that 4 out of the 12 
links carry significant information (see Table 10.4.5), 
which are links 4,7,3, and 10; this led to a 3-4-1 network 
(Fig. 10.4.6(b)). 

Both 3-12-1 and 3-4-1 networks are trained to 30,000 
epochs; their comparative performances as shown in Fig. 10. 4. 7 
and Table 10.4.5 are found to be reasonably close. 

Remarks 

(1) The selection of the hidden layer links for the 
optimised structure may not be unique over the learning 
regime of the network, as in Exercise-2 on the COD process 
(see Table 10.4.5). In the present case, the validation 
performance for the candidate networks (4,7,9,10) and 
(4,7,3,10) however show close performance. Thus it is 
expected that in a real-life situation, the presented method 
for the optimised structure will lead to a solution with 
significantly reduced size, although the closeness to 
optimality is difficult to quantify, except through tests 
like validation performance etc. 

(b) Both in cases of the MG series modelling (Sec.10.4.2) 
and the COD process modelling, the SVD and subset selection 
based method of producing reduced-size networks is found to 
work. As expected the reduced-size networks show comparable 




10.5 Modelling with Ortho gonalized Data 295 


Table 10.4.4 Selection of optimum number of hidden nodes 
for 3-6-1 homogenous network modelling the COD process 


Epochs Singular values 


No. of 
nodes 


20 7.58, 0.56, 0.26, 0.17, 0.007, 0.004 4 

200 8.43, 0.79, 0.41, 0.22, 0.012, 0.005 4 

2000 11.72, 1.86, 0.59, 0.14, 0.044, 0.005 4 


Table 10.4.5 Selection of optimal links at hidden layer of 
4-19-1 network modelling MG series 


Epochs 

No. of 

nodes 

selected 

Singular values 


Selected 

1 inks 

200 

4 

16.31, 1.15, 
0.019, .... 

0.54, 

0.002 

0.29, 

4,7,9,10 

2000 

4 

9.60, 1.14, 
0.04 

0.54, 

0.002 

0.32, 

4,7,3,10 


Table 10.4.6 Normalized RMSE for estimation of COD over 
the validation data sets 

Network: 3-6-1 3-4-1 3-12-1 3-4-1 

NRMSE 0.1696 0.1653 0.139 0.123 


or better performance than the oversized networks. In the 
COD process case, the data being noisy, the reduced 
nonhomogeneous network produces minimum error on validation 
test (see Table 10.4.6). 


10.5 MODELLING NETWORKS WITH ORTHOGONALIZED DATA 

The neural networks designed in the last section used 
measured input and output data in time (or spatial) domain. 
This section studies neural network models f or nearly 
periodic series using orthogonalized data sets. The method 
is also applicable for quasiperiodic series configured as 
periodic series, as discussed in Sec.11.2.1. 
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10.5.1 Modelling Principle and Perspective 

Periodic data series can be optimally compressed using SVD; 
this feature is used in present modelling procedures. 


Orthogonalization 

T 

An mxn matrix A = USV can be orthogonalized as Z = AV = US, 
where Z = z t are m-orthonormal vectors: 

for example, = lz 1>t ...z t>1 ...z^I. If Rank(A) = p, 

P P 

A = ju^v^ = (10.5.1) 

1=1 i=l 

T 

If p=l, the i-th row of A is given by z t > 1 v 1 . 


Arrangements and analysis of the data 

The characterization of a nearly periodic series through 
singular value decomposition has been discussed in Sec. 
7.7.1; some of the main points are restated here. 

The time series data can be arranged into a matrix A 
such that the consecutive periods are aligned into conse- 
cutive rows. The degree of periodicity will be reflected in 
the singular values of A. The ratio, s 1 /s 2 , will be oo for a 
strictly periodic series; as the series deviates from 
perfect periodicity this ratio decreases. 


Modelling and prediction 

When Sj/s 2 is large, A = USV T “ = ZjV^, where v* will 

be indicative of the periodic pattern, and the elements of 
z t will be the scalar weights associated with the respective 
rows of A. For the sake of simplicity let the elements of Zj 
be expressed as 

Zi — Izj, z 2 , ...» Zjn] . 

The series of elements (Zj, z 2 , ..., z m > are modelled T using 
a neural network. The modelling scheme assumes that Vj, the 
first right singular vector, remains almost unchanged if an 
additional row is appended to A. 

So, one period ahead prediction^ of T the time^ series 
represented by A will be given by z m+1 vj[, where z m+1 is 
obtained from the neural network. 

The modelling scheme is presented in Fig. 10.5. 1(a). The 
orthogonalized data are arranged for modelling as follows. 
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Time 

domain 

Orthogonal i- 
zation and 


Neural 


Reverse 
t ransf or- 

Time 

domain 

input 

(A) 

truncat i on 




Network 


mat ion 

<• 

ou t put 


(a) 


^Tn-r-d+l **■ ^»-r 

Z»-r-d ••• ^m-r ^»-r- 

Za-d Za-2 z m-l 

Input data sets 

Figure 10.5.1 (a) Periodic series modelling scheme with 

the neural networks operating with orthogonalized data, 
(b) The input and output data configurations. 


Neural 

Network 


“* ^-d-l ••• Z*-i Z* 


Output data set 


(b) 


Consider a neural network with r inputs and 1 output; 
as shown in Fig.l0.5.1(b), let the data be configured for an 
epoch length of d (al), where (r+d)sm. Thus the input and 
the output data sets are suitably arranged f rom the data 
sets {Zm.ii ^m-r-d+ 1 ^ and {z,,,! Zm_ii...| ^m-d+ 1 ^ 

respectively, for the training of the network. The trained 
network is subsequently used with the corresponding input 
set configured from the data {z,,,, z^ ^-r-dj^ to Pro- 
duce the predicted output vector (z^, The 

periodic prediction is given by z m+1 vj. 

The neural network may be homogeneous or nonhomo- 
geneous, and may be designed using the concepts discussed in 
the earlier sections. 

Summary 

(1) The m consecutive periods (each of length n) of the 
series are aligned into consecutive rows of an mxn matrix A. 
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(2) Compute A = USV T = ZV T ; if s 1 »s 2 , A “ z t v[. 

(3) The series formed by the elements of Zj, are modelled 
using (typically) a r-1-1 feedforward network (r & m-1). 

j^4) The trained network is used to produce the prediction 

z m+i* A T 

(5) The one period ahead prediction is produced as z m+1 Vj. 
Remarks 

(1) If more than one (SV-)decomposition components are 
dominant in (10.5.1), separate neural networks will be 
required for modelling each (see Sec. 7. 8.1). 

(2) The main strengths of the present method are (a) the 
modelling in terms of the SV-decomposition components which 
carry compressed inf ormation by virtue of the transf or- 
mation, and as a result the network size is substantially 
reduced, and (b) the nonlinear modelling of the elements of 
z t through the neural network, which makes the present 
method different from the one presented in Sec.7.8.1. 


10.5.2 Modelling of the Indian Rainfall Series 

A series of spatially coherent rainf all over the North- 
western and Central parts of India is considered (Appendix 
7F). The data for 40 years (from 1940) are used for 
modelling. The data are monthly with yearly periodicity. The 
series shows poor degree of repeatibility (see Fig.2.2.2) 
with SJ/S 2 of the order of 6. 

The data are arranged as shown in Fig. 10. 5. 1(b); it is 
found that an appropriate choice can be d=34, and r=6. A 
3 layer network with sigmoidal nonlinearity at the hidden 
and output nodes is considered, and the backpropagation 
algorithm is used for training. A 6-3-1 homogeneous network 
is used, which is trained through 5000 epochs. The predic- 
tion result for 3 consecutive years is shown in Fig.10.5.2. 

Remark 

In the present context, the poor repeatibility between the 
periods may be due either to the variation of the periodic 
pattern or to the variations in the scaling f actors between 
the patterns, or both. The present approach assumes the 
pattern remains almost unchanged at v*, so the variation in 
the scaling factor (given by the elements of z t ) is modelled. 
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Figure 10.5.2 1 to 12 month ahead prediction of 

rainfall over the period 1981-1983; 

actual data, predicted values. 


10.6 ASSESSMENT OF CONVERGENCE USING SVD 

The convergence of a neural network during training is 
usually assessed in terms of the output error. If an mxn 
input data set is used for training, the output errors for m 
different sets of input has to be studied. SVD offers an 
alternative approach to convergence assessment through the 
assessment of rank-oneness of the output matrix over several 
epochs as follows. 

The training through one m-long epoch implies m number 
of network-weight updations, which will produce an m-output 
vector; let the corresponding reference output vector be y R . 
g epochs will produce an mxg output matrix Y. At ideally 
true convergence, all the columns of Y will be identical to 
y R , and Y will be of unit rank. On the other hand, during 
the training, before convergence is reached, the weights on 
the links and thresholds will keep changing, and in such a 
state, the columns of Y will be different from each other. 
So the degree of convergence of the network can be expressed 
in terms of the distance of the output matrix Y from 
rank-oneness, or in terms of the singular values as follows. 

Let SVD of Y be expressed as 

P j 

Y = ^E^YiSyiVyi* 


( 10 . 6 . 1 ) 
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where p = min(m,g). The ratio of the energy contained in the 
most dominant component u yi s yi v yi and the total 
reference-output-energy is given by 

c = sjj/gyjyp. 

Ideally, at convergence c = 1, so the percentage of residual 
energy at convergence cam be expressed as 

*c = 1 - c. 

Remarks 

(1) k will be insensitive to local minima if g is large 
enough to encompass such minima. 

(2) The NN output and the reference data sets are mean 
extracted before computing k to make s Y1 insensitive to the 
mean value for nonzero-mean data. 


Example 10.6 Convergence assessment of the 6-3-1 homoge- 
neous network modelling the Mackey-Glass series 

This network is shown in Fig. 10. 4. 1(b). At different stages 
during the training of the network with an epoch length of 
209, the output matrix Y is formed with g = 200. The 
convergence results are shown in Fig.10.6.1. The learning 
activity shown by the k profile conforms to that shown by 
the output error plot. 


10.7 CONCLUSIONS 

The neural network is an extremely versatile method f or 
modelling and prediction, which is applicable to a large 

number of problems. In addition, neural networks are 
amenable to high speed processing through parallel 

computing, which greatly extends the application prospects 
for these networks. 

A neural network is constructed with weighted inter- 
connections of the basic building blocks called nodes, each 

of which represents simple and static nonlinearity. One of 
the strengths of the neural network modelling is the 

adaptive learning or weight-updating mechanism which is 
based on an iterative nonlinear optimization technique. The 
networks are much more flexible, fault-tolerant and powerful 
than GMDH models discussed in the last chapter. 



10.7 Conclusions 301 



(a) 



Figure 10.6.1 Assessment of the convergence during 
training of the 6-3-1 network modelling the MG series: 
(a) output error approach (b) SVD based approach. 


As regards the size of the network, it is important 
that the network is of optimum size; a oversized network is 
expected to model the process as well as the noise, whereas 
the undersized network will not be able to represent the 
process dynamics f aithf ully. In this chapter the design of 
networks with seemingly optimised structure using SVD and 
QRcp factorization based subset selection has been discussed 
and demonstrated; the high degree of numerical robustness of 
these transforms is an important feature. In this 
connection, it is also to be noted that a conventional 
homogeneous network may not always be the best architecture 
for a real-life process. 

An SVD based method for the convergence assessment in 
the training of the network has also been discussed. The 
convergence over a number of epochs is estimated through the 
rank-oneness assessment of the output error matrix, which 
can be more meaningful than the conventional output error 
method. 
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CHAPTER 11 


MODELLING AND PREDICTION OF QUASIPERIODIC SERIES 


Quasiperlodic series can be modelled using SVD and 
nonlinear modelling or nonlinear transformation and 
linear modelling. Alternatively periodic decomposition 
and modelling of periodic components may be used. 


11.1 INTRODUCTION 

Modelling of nearly periodic time series is quite straight 
forward and can be done for example using the singular value 
decomposition based method discussed in Sec. 7. 6, or using 
Box and Jenkins method discussed in Sec. 4. 3. So if a 
quasiperiodic series can be configured into multiple nearly 
periodic series through decomposition or transformation, the 
modelling problem can be simplified; this is the basic 
concept used f or modelling quasiperiodic series in this 
chapter. 

As discussed in Sec. 8. 2.1, quasiperiodic series can be 
characterized by 

(a) certain periodicity which varies irregularly, 

(b) the amplitude over the periodic segments which vary 
irregularly, and 

(c) the absence of any definite repeating pattern. 

Three different methods of modelling are presented in this 
chapter. All the methods model a quasiperiodic series in 
terms of its constituent periodic components (which are not 
necessarily sinusoidal) but the way the components are 
extracted and individually modelled is diff erent. All the 
methods produce periodic (or pseudo-periodic) models which 
may be used to produce one period ahead prediction. The 
first two methods are conceptually similar; the modelling 
involves singular value decomposition (SVD) coupled with two 
different ways of incorporating nonlinearity in the 
model, namely nonlinear transformation and nonlinear 
modelling using a neural network. The third approach, 
formulated in structural modelling framework, involves 
direct decomposition into periodic components. 

In the first method, the SV-decomposed data matrix is 
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assumed to constitute a relatively regular part and an 
irregular part. The regular part is modelled as a nearly 
periodic series as discussed in Sec.7.8.1. The irregular 
part is nonlinear ly transformed such that a further 
relatively dominant regular part emerges along with an 
irregular part, and this method of transformation and 
segregation is continued until no further regular part can 
be extracted. Each regular part is linearly modelled and the 
linear prediction is successively reverse (nonlinearly) 
transf ormed to produce additive components of the periodic 
predictor. This modelling scheme is discussed in Sec. 11. 2. 

The second approach uses the same concept of the 
separation into regular part and the irregular parts etc. 
but instead of a linear model a neural network is used to 
model the selected regular part. This scheme is treated in 
Sec. 11. 3. 

The third approach is conceptually different from the 
other two. Here the quasiperiodic series is decomposed into 
multiple nearly periodic components using SVD. Each 
component has its own fixed period length and a repeating 
pattern, which may be differently scaled between the 
periods. Each nearly periodic component is separately 
modelled. Sec. 11. 5 details the modelling scheme based on 
periodic decomposition. This section is also supported by 
Appendix 11, where the Singular Value Ratio (SVR) spectrum 
is introduced, which is used to determine the period length 
of the most dominant periodic component in a quasiperiodic 
series. 


11.2 MODELLING USING SVD AND NONLINEAR 
TRANSFORMATION 

The basic idea is to model a quasiperiodic process as a 
combination of component processes, periodic in nature, but 
belonging to hierarchical levels of nonlinearly transformed 
spaces, where the period length of each component is the 
same. This method is applicable to series which show a 
certain degree of repetitiveness, although both the period 
length as well as the periodic pattern may vary. 


11.2.1 Data Preparation 

There are two main ways of arranging the data into a matrix 
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as follows. 

(a) The maximum period length for the quasiperiodic 
periodic series is considered to be the row length (n) of 
the data matrix X. The consecutive pseudo-periods are 
aligned (say with respect to the peak or the trough etc.) 
into the consecutive rows of X, and linearly interpolated 
data are used for the rows shorter than the maximum length. 

(b) The period length of the most dominant periodic 
component present in the data is considered to be the row 
length (n) of the data matrix X. The most dominant 
periodicity in the series is detected using the SVR spectrum 
(discussed in Appendix 11). The successive nearly periodic 
segments in the series are compressed or expanded to the 
length n as follows. 

Let y(l), y(2), .... y(n*) be the data in a particular 
segment of the original series, which are to be replaced by 
the contracted or expanded data set x(l), x(2),...,x(n) 
where n * n*; the transformation is given by 


x(j) = y(j*) + (y(j*+l) - y(j*))(rj - j*), 


where 

rj = (j - 1) 


(n*-l) 
(n-1 ) 


+ 1 


( 11 . 2 . 1 ) 


and j* is the integer part of rj. Thus the quasiperiodic 
series <y(.)> with varying period length is converted into 
an augmented series (x(.)) having the period length n. The 
series (x(.)> is aligned into the rows of the data matrix X. 


The modelling and prediction procedures are the same 
irrespective of the way the data are arranged. The predicted 
data have to be reverse transformed with respect to the 
linear interpolation used or the compression or expansion 
applied as stated above f or producing prediction in the 
original time domain. 


11.2.2 Modelling and Prediction 

Let the quasiperiodic sequence be converted into an Mxn data 
matrix X as discussed earlier, and let an mxn data matrix 
A(k) be moving over X, where m<M. Define the kth data 
window A(k) by 
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A(k) = 


PH 


a k-m+2 


k 


( 11 . 2 . 2 ) 


SVD is performed on A(k) for each value k. If SVD of A(k) 
produces r dominant singular values, 

A(k) = USV T 

r p 

= ZV T = £ z t (k)vi(k) + £ z^kjvifk), p = min(m,n) 

1=1 l=r+l 

(11.2.3) 

= AJk) + A N (k) (11.2.4) 


a L (k-m+l) 


’a N (k-m+l)’ 

a L (k-m+2) 

+ 

a H (k-m+2) 

. a L< k ) 


_ a fj(k) 


(say). 


Thus a L (k) and a N (k) are the last rows of A L (k) and A„(k) 
respectively. A L (k), which is composed of the dominant 
decomposition components (in (11.2.3)), is the regular part 
and the residual A„(k) is the irregular part of the 

quasiperiodic process described by A(k). 

The objective is to produce the prediction 

a(k+l | k) = a L (k+l|k) + a„(k+l|k). (11.2.5) 

^ L (k+l|k) is computed from the sequence (A L (k)>. To compute 
a N (k+l Ik), successive nonlinear transformation and singular 
value decomposition is performed. Further details of the 
prediction procedure follows. 

Prediction of the regular component a L (k+l|k) 

The prediction policy followed for the regular component is 

the same as discussed in Sec. 7.8.1, which is briefly 

restated below. 

(1) When r singular values are dominant, r separate pattern 
components constitute the regular part of the process. 

(2) The i-th dominant component of A L (k) is given by 

(zjVtMk), 1 2 S i s r. It has two main parts. The 
elements of z t define the scaling factors and the row 
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T 

Vj defines the pattern. 

It is intended to model the series generated by the 
last (i.e. m-th) element of ZjOc) for different values 
of k. 

The prediction of a L (k+l|k), the (m+l)th row of A L (k), 
is computed as 

(a L (k+l|k)) T = £ ^t(k+l|k)*i(k) (11.2.6) 

where (k+1 j k) is the one-step ahead prediction of 

(ZjjiC.)) produced at time k. 

Remarks 

(a) The i-th dominant component of A(k) and A L (k) are the 
same in (2) above. 

(b) The modelling method assumes Vj in (11.2.6) remains 
unchanged between the windows A(k) and A(k+1). 

(c) r number of predictors are run simultaneously in 
(11.2.6), where r may be typically 1 or 2. 

Prediction of the irregular component a N (k+l|k) 

The fundamental step in the extraction of information from 
A N (k) is to find a suitable nonlinear transformation which 
enhances the dominance of a periodic component; this results 
in an increased ratio of the singular values, s x /s 2 , for the 
transformed matrix compared with that for A N (k). The 
prediction policy can be summarized as follows. 

(1) Generate mxn matrix B through a suitable nonlinear 
transformation f N1 (. ) of A N : 

B(k) = f H1 (A M (k)). (11.2.7) 

(2) Perform SVD of B and segregate the regular and the 
irregular parts: 

B(k) = U b S b Vb, 

r b P 

= E u bl s bl v bi + E u bi s bl v bl> 

1=1 i=r b +l 

= B L (k) + B„(k). 

(3) Compute the prediction ^(k+ljk) 
way as a L (k+l|k) in (11.2.6): 

(^(k+l jk)) T = ^E^Sbi (k+1 1 k)v b j (k). 


p=min(m,n) 

( 11 . 2 . 8 ) 

due to B l , the same 
(11.2.9) 


(3) 

(4) 
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where r b are the number of singular values of B L (k) that are 
dominant, and g bl (k+l|k) is obtained from (B L (.)>, the same 
way as ZnJk+ljk) is obtained from (A L (.)> in (11.2.6). 
v bl (k) in (11.2.9) is assumed to remain unchanged between 
the consecutive windows B(k) and B(k+1). 

(4) Obtain a N (k+l|k), due to B L (k) by reverse nonlinear 
transformation of D L (k+l|k): 

a N (k+l | k) = fm^^k+llk)). (11.2.10) 

(5) Compute the complete prediction using (11.2.6) and 

( 11 . 2 . 10 ): 

a(k+l|k) = a L (k+l|k) + a N (k+l|k). (11.2.11) 

Remarks: 

(a) The nonlinear transf ormation f N i is a one-to-one trans- 
formation (i.e. which can be applied individually to each 
element), e.g., the logarithmic transf ormation. It is diffi- 
cult to find the optimal nonlinear transform mathematically, 
(see for example Tukey, 1957), so a suitable transformation 
may be determined experimentally. Some transformations will 
require some preparatory operations, e.g. in case of 
logarithmic transformation, the positivity of the data has 
to be ensured. 

(b) To decide the suitability of a nonlinear transforma- 
tion, the effect of the reverse nonlinear transformation of 
the prediction needs to be taken into consideration. In 
terms of the singular values, the nonlinear transformation 
f N1 (.) in (11.2.7) should satisfy the condition: 

( s r+l/ s r+2^ ^ (f*Nl (Sbl^ba^* 

where s r+1 and s r+2 are the singular values of A(k) and s bl 
and s b2 are the singular values of B(k). 

(c) In (11.2.10), only the information from B L in the form 
of b L (k+l|k) is contained in a„(k+l|k). Further information 
may be extracted -from B„(k) 'of B(k) as- follows-. Generate 
C(k) through some suitable nonlinear transformation f M? (. ) 
of B„(k): 

C(k) = f H2 (B H (k)) = qte) + C||(k), (11.2.12) 

and repeat steps similar to (2) to (3) to compute c L (k+l|k), 
which is reverse transformed to 

6„(k+l|k) = fua^cjk+llk)). 
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A ( k ) 

I 



Figure 11.2.1 Model schematic for the quasiperiodic 
series using SVD and nonlinear transformation. 


The sum of 6 L (k+l|k) and it> N (k+l|k) A is now reverse trans- 
formed in (11.2.10) to compute a N (k+l | k). 

Summary of modelling and prediction scheme 

A schematic description of the model of the quasiperiodic 
process is shown in Fig. 11. 2.1. The model can be summarized 
as follows. 

A(k) = A L (k) + A„(k) 

= A L (k) + fjji [f Ni ( A N (k) ) ] 

- A L (k) + f„^[B(k)] 

= A L (k) + f jji [B L (k) + fjj2 (fn2^B||(k))]] 

= A L (k) + f^^lk) + f^lCWl] 

= A L (k) + f N ^[B L (k) + f^ic^k) + c*(k)]]. 


(11.2.13) 
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<A( k ) > 

I 



Figure 11.2.2 The one (pseudo-)period ahead prediction 
scheme 

where f N1 , f N2 etc. are nonlinear transformations; A L (k), 
BlOO, ClUc) etc are the relatively regular parts of A(k), 
B(k) and C(k) respectively obtained through SVD. Whether 
further extraction of information is worth considering will 
depend on the corresponding signal magnitude of the error 
component due to C^k) in the original time (or spatial) 
domain, given by 

e(k) = A(k) - A(k), (11.2.14) 

where 

A(k) = A L (k) + f^lBJk) + f H 2 1 lfN2(C L (k))]J. (11.2.15) 

The corresponding prediction law is given by 

a(k|k+l) = a L (k|k+l) + f H i X (6 L (k | k+1) + fw^ 1 (c L (k | k+1))), 

(11.2.16) 

where a(k|k+l) is the prediction of the (m+l)th row of mxn 
matrix A(k) at time k. The schematic diagram for the 
prediction procedure is shown in Fig.11.2.2. 
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11.2.3 Application Study using the Sunspot Series 
Arranging the data 

The yearly averaged sunspot data over the years 1700 - 1972 
(Appendix 8A) are used in this study. The nearly repetitive 
(pseudo- periodic ) segments of the series are arranged into 
the consecutive rows of X; the segments are considered to 
start from the lowest points which are aligned into the 
first column. The period lengths vary between 9 to 14; for 
the present study the row-length of X is chosen as 14. The 
rows which are shorter in length are appended with linearly 
interpolated values. X works out to be a 23x14 matrix. 

Remark 

The interpolation of the data is necessary in order to fill 
up all the positions in the matrix X such that SVD analysis 
can be performed. It has been found that the interpolation 
scheme does not affect the singular value decomposition 
appreciably in comparison with that obtained without 
appending data and using the minimum pattern length (i.e. 
using n as 9 instead of 14, in the present case). Qa 

A 4x14 data window A(k) is considered moving over the 

complete data set X. The first 20 rows of X are used for 
modelling, and the rest are used f or validation through 

prediction performance. 

For consecutive positions of A(k), SVD is performed on 
A(k). It is observed that (s 1 /s 2 )(k) is of the order of 7 
and that v x (k) and v 2 (k) remain fairly unchanged between 

consecutive windows. So in the model (11.2.3), r = 2. The 

progressive distribution of the singular values of A(k) is 

shown in Fig. 11. 2. 4. 

Reconstruction 

The reconstruction of A(l) in terms of A L (1) is shown in 

Fig. 11. 2. 3 for the years 1700 to 1743 along with the 

original sunspot activity series. The interpolated data 
points are eliminated and have not been shown in the figure. 
Fig.11.2.4 shows the residual irregular part A M (1). As 
expected, the regular part shows a repetitive pattern, 

whereas the irregular appears to be random. 

Prediction of a L (k+ljk) from the regular component A L (.) 

The two basic problems are prediction of z Bl (k+l|k) and 
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Sunspot 

numbers 



(a) 



1700 1705 1710 1715 1720 1725 1730 1735 1740 1745 


(b) 


Figure li.2.3 (a) The reconstruction of the sunspot 

series given by A L (1), using two most significant 
singular values of A(l). 

actual data, reconstructed data. 

(b) The residual irregular part (A„(l)) after separa- 
tion of two most significant components A L (1) from A(l). 

| k). Since 20 cycles (pertaining to the years 1700 to 
1922) are used for modelling, and since A(k) is of size 
4x14, k varies from 1 to 17. So 17 data points are available 

for modelling each of the sequences (ZjuC.)) and {z m2 (.)> 

for the prediction of the 21st period. Note that z ml and z m2 
sequences are described by (11.2.3), for example in the 
present case z^fk) = (u 41 s 1 )(k), where u 41 is the 4th 
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element of Ujd). 

Computation of z ml (k+l|k) 

It is found that (z^d)) can be modelled as an ARIMA 
(0,1,4) process: 

Az^fk) = D(q _1 )e(k) (11.2.17a) 

where e represents white noise, D(q _1 ) is a discrete-time 
polynomial of order 4: 

D(q -1 ) » 1 + d^ -1 + d 2 q -2 + d 3 q~ 3 + d 4 q~ 4 , 

and A is the unit difference operator. 

At each step k, the^ parameters of (11.2.17a) are 
estimated and the prediction z Bl (k+l|k) is computed as 

z Bl (k+l|k) = z Bl (k) + D(q _1 )e(k), (11.2.17b) 

where e(k) is estimated as 

e(k) = Az ml (k) - Az.ifklk-l). 

Computation of z m2 (k+l | k) 

It is found that the (z^I.)) process can be modelled by a 
third -order AR model: 

F(q" 1 )z B2 (k) = tj(k), ' (11.2.18a) 

where 

F(q _1 ) = 1 + f iq _1 + f 2 q' Z + f 3 q 3 , 

and <T)(k)> is the noise sequence. At each step k, the 
parameters of (11.2.18a) are computed, and one-step ahead 
prediction is produced as 

z B2 (k+l|k) = (1-F(q~ 1 ))z m2 (k). (11.2.18b) 

The prediction a L (k+l|k) is obtained from (11.2.6) as 
(a L (k+l|k) = z ml (k+l|k)v 1 (k)+ z m2 (k+l|k)v 2 (k). 

(11.2.19) 

Prediction of a N (k+l|k) from the irregular component A„(.) 

It is f ound that logarithmic transf ormation is a suitable 
transformation in the present case. A constant value of 60 
is added to all the elements of (A N (k)> to ensure that all 
the elements are positive before transformation. The (A N (k)> 
sequence is now logarithmically transformed to (B(k)> and 
the SVD of (B(k)> is performed for k = 1 to 17. The 
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Figure 11.2.4 The progressive distribution of (norma- 
lized) singular values of (A(k)> for the sunspot series. 



Figure 11.2.5 The progressive distribution of singular 
values of the log-transformed irregular part (B(k)>. 


progressive distribution of the singular values is shown in 
Fig. 11.2.5. It is observed that there is sharp segregation 
in the distribution with one dominant singular value, 
whereas before transf ormation the data appeared to be random 
as shown in Fig. 11. 2. 3(b). 

The prediction a N (k+l|k) is computed using (11.2.8 - 
11.2.10). It was observed that only the 1st right singular 
vector of B(.), i.e. v bl , remains almost unchanged between 
consecutive windows of B(.); so in (11.2.8), r b = 1. The 
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Figure 11.2.6 Prediction (from 1 up to 14 step ahead 
prediction) of the sunspot series over 21st to 25th 
( pseudo- ) periods from the year 1700; data. 

sequence (z bl (k)> is modelled as 

H(q _1 )z bl (k) = t)(k), (11.2.20a) 

where 

H(q -1 ) = 1 + h^q -1 ) + h 2 (q~ 2 ) + h 3 (q’ 3 ). 

The one-step ahead prediction is given by 

z bl (k+l|k) = (1 - H(q -1 ))z bl (k), (11.2.20b) 

where the parameters estimated from (11.2.20a) are used. 

The prediction t> L (k+l|k) is computed using (11.2.9) as 

6 L (k+l | k) = z bl (k+l|k)v bl (k); 

following (11.2.10) fi L (k+l|k) is antilog transformed and a 
constant value 60 is subtracted from all the elements to 
obtain the vector a N (k+l|k). 

Complete prediction 

The complete prediction is obtained using (11.2.16). The 
results of one period ahead prediction for 5 cycles is shown 
in Fig. 11. 2. 6. 

Remarks: The selection of suitable nonlinear transformation 
may involve a trial and error approach. The appropriate 
transformation should enhance the dominance of the prime 
singular value(s) (see concluding Remarks of Sec.8.4.2). 
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11.3 MODELLING USING SVD AND NEURAL NETWORK 

The method discussed here is structurally similar to that 
presented in the last section; the difference is that in the 
individual spaces, instead of a linear model as in 
Sec. 11. 2.1, (z^D) sequences are modelled by neural 
networks. Thus nonlinearity is incorporated in the modelling 
in a new form. 

Application 

Consider the problem of modelling of the sunspot series 
discussed in Sec.11.2.3. 

In the present case, three layer homogeneous neural 
networks are used to model the <z ml (. )> sequences; each of 
the hidden nodes and the output node of the network has a 
threshold level and has sigmoidal nonlinearity (between 0 
and 1). The backpropagation algorithm is used to train the 
network. Detailed discussions on the design of feedforward 
neural networks is given in Chapter 10. 

With reference to Sec.11.2.3, (z 41 (k)> is modelled 

using a network with z 41 (k-l), z 41 (k-2), z 42 (k-l), z 42 (k-2) 
as the inputs, and z 41 (k) as the output. Initially 7 hidden 
nodes are used. After an initial stage of training, it is 



1920 1930 1940 1950 1960 1970 1980 

Year (1923-1974) 


Figure 11.3.1 Prediction of the sunspot series using 
SVD and neural network models. 

actual data predicted values. 
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found that only 3 hidden layer nodes are necessary. So a 
4-3-1 network was retrained to convergence through 10,000 
iterations. For modelling of the (z 42 (k)> sequence a network 
of similar structure was used. 

The trained networks are used to produce the 
predictions z 41 (k+l(k) and A z 42 (k+l | k) and a L (k+l|k) is 
produced from (11.2.19). For a M (k+l|k), the same result as 

obtained in Sec. 11. 2. 3 is used, the final predictions Eire 
produced in Fig.11.3.1. 

Remark : For modelling of (z b (k)> sequence in (11.2.20) a 

nonlinear model can also be used. However, in the present 

case, as the contribution of this component in the overall 

prediction is not very significant, only the linear model is 
considered. 


11.4 MODELLING OF A QUASIPERIODIC SERIES THROUGH 
PERIODIC DECOMPOSITION 

11.4.1 Introduction to Periodic Decomposition 

A quasiperiodic time series or signal may be decomposed into 
a number of components each of which is periodic or nearly 
periodic in nature but not necessarily sinusoidal. The 
quasiperiodic series may be modelled by modelling the 
individual periodic components separately. The attributes of 
a constituent periodic component are 

(i) a fixed period length, and 

(ii) a fixed periodic pattern which may be differently 
scaled over the periods. 

Remark 

In the case of Fourier decomposition, each component is 
sinusoidal having a specific frequency, whereas in the case 
of periodic decomposition, each periodic component may 
consist of any number of sinusoidal components. 

Why periodic decomposition 

Often, a quasiperiodic time series or process may be gene- 
rated from a number of auxiliary processes each producing a 
periodic or nearly periodic pattern. Frequency domain 
analysis can decompose the composite signal into constituent 
sinusoids, whereas the different constituent periodic compo- 
nents may contain overlapping bands of sinusoidal components. 
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For example, consider the ECG signal, picked up from the 
abdominal lead on an expectant mother (Sec.14.5.4). It shows 
the maternal ECG component along with the fetal ECG 
component. The two components are individually (nearly) 
periodic but mutually asynchronous. In such cases it is more 
meaningful to separate the two ECG components instead of 
analysing the sinusoidal components that make up the 
composite maternal ECG signal. 

The mechanism of periodic decomposition and the model- 
ling and prediction of quasiperiodic series through 

decomposition of periodic components is detailed in this 
section; the extraction of periodic components from 
composite signals is treated in Sec. 14.5. 

The mechanism of periodic decomposition 

The periodic decomposition involves a two step procedure 
which is performed successively for extraction of successive 
periodic components in order of the energy content as 
f ollows. 

(1) Determine the period length n t of the strongest 
periodic component present in the data series. 

(2) Configure the data into a matrix A having the row 

length n t , perform SVD of A: A = USV , and extract the 
principal periodic component A pl = u 1 s 1 v[. 

The residual data series is formed from (A - A pl > and is 

further decomposed by going through steps (1) and (2). This 

procedure is continued until no further extraction is 
possible. If altogether N different periodic components can 
be extracted, the resulting decomposition is given by 


(A) = <A pl > + <A p2 > + ... + (A pN >, (11.4.1) 

where the components have N specific period lengths. 

Remark: If more than one periodic components are detected 
to have the same period length, they will be mutually 
orthogonal. 


11.4.2 Period Length Estimation for Periodic Components 

The period length can be determined, if unknown, using the 
singular value ratio (SVR) spectrum. 

The basic idea for the period length estimation is that 
if the strongest periodic component in a series is of period 
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length n t , then a matrix A, with row-length n t , formed using 
the data will produce a large value of s 1 /s 2 on singular 
value decomposition, where Sj and s 2 are the first two 
singular values of A. 

The basic scheme for the estimation of n t is that the 
data matrix A is formed for various choices of row-length n, 
and the ratio Si/s 2 is computed for each A. The variation of 
s t /s 2 against the row-length n, will show a peak at n t and 
its multiples; thus n t is estimated. The generic term 
Singular Value Ratio (SVR) spectrum is used to refer to the 
variation of s r /s 2 against the row-length n; in place of 
Sj/s^ Sj/sl or s 1 /(si+...+Sp), may also be used, where 
p = rank(A). In the present study the ratio Sj/s 2 , referred 
to as p, has been used. For a detailed study of the SVR 
spectrum, please refer to Appendix 11. 

Remark 

Since the peaks in the frequency spectrum of the composite 
series will show the overall contributions of the individual 
components in different frequencies, the Fourier spectrum 
cannot serve the same purpose as the SVR spectrum. 


11.4.3 Estimation of the Strongest Periodic Component 

The estimated periodic pattern primarily belongs to two 
categories: the principal SV-decomposed periodic pattern and 
the average energy pattern. 

Principal SV-decomposed periodic pattern 

Given any mxnj matrix A, its principal periodic pattern is 
given by 

A pl = u^Vi = z t Vi, (11.4.2) 

T T 

where A = USV = ZV , and u t , Vj and Zj are the first 
columns of U, V and Z respectively. The time series given 
by (A pl > will have the same repeating pattern given by v 1( 
which will be weighted by the factors <Zj 1 > where Zj x is the 
j-th element of z* weighing the j-th row of A pl . 

The principal pattern series (A pl ) can be modelled in 
terms of Vj and a time series given by the sequence of 
elements <Zjj >, where the latter can be modelled by a linear 
model (e.g., am AR model) or a nonlineam model (e.g., a 
neural network model). 
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Average energy pattern 

The series with average energy pattern is a strictly 
periodic series where each period contains average row 
energy of A pi . The average energy pattern is studied in 
Sec. 14. 5. 2. 

The energy contained in the i-th row of A pl is given by 
Ei - [UnS^ffuuSjVi] 

= u? 1 [s 1 v[] T [s 1 v[]; 

hence, using (14.5.1), the average energy periodic pattern 
is given by 



where m is the column length of u t . 

Averaged pattern over a moving window 

Given an Mxn t matrix A, an mxn 4 data window A' may be 
assumed to move over A, where m<M. Conceptually this 
arrangement is the same as in Appendix 7A.2. There will be 
M-m+1 such windows. For each position of A', the prime 
component as in (11.4.2) is computed. An average of the 
candidate patterns obtained from the different windows is 
computed f or each period and thus the complete periodic 
sequence of period length n t is constructed. This scheme may 
produce better result for dynamic series. 


11.4.4 Implementation Considerations 

(1) Prime steps in implementation 

The periodic decomposition has two prime elements: the 
determination of period lengths and the estimation of the 
periodic components. The period lengths can be uniquely 
determined through procedures discussed in Sec. 11. 4. 2 but 
the estimation of the periodic components largely depends on 
the type of data and the applications concerned. 

(2) Zero-meanness of components 

In (11.4.2), if v t is zero-mean, the corresponding time 
series (A pl > will also be zero-mean. 
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(3) Nature of periodic patterns 

The change in sign of the elements of u t or z l will result 
in the inversion of the pattern in the corresponding time 
series. This phenomenon results from the fact that from an 
algebraic point of view, there is no time series inf orma- 
tion associated with A pl ; the time series information is 
imparted through the modelling of the series (z^). There 
are two possibilities: 

(a) The series, <Zj 1 > cam be modelled even if the elements 
of <Zj ± > have different signs. 

(b) If the reversal of the pattern is not acceptable, one 
may proceed as follows, (i) A nonzero constant value may be 
added to the data to make it non-negative; the consequent 
decomposition should be free from inversions of periods. The 
decomposition will be meaningful if the energy in the 
constant mean is much less than the same in the rest of the 
data, (ii) Alternatively, a periodic series with an average 
energy pattern may be used; this approach is preferrable for 
non-prime periodic components. 

(4) Chances of modelling noise 

With successive extraction of periodic components, the 
residual becomes more and more noisy; in such cases care 
should be taken not to model the noise as periodic 
component(s). 

(5) SVR spectrum analysis 

(a) The SVR spectrum will also show peaks at multiples of 
the wave lengths of interest (as discussed in Appendix 11); 
so the corresponding shortest period length is the one 
desired, (b) SVR spectrum analysis requires only the 
singular values and hence will not be computation-intensive, 
if the singular vectors are not computed. The typical 
figures for the number of operations (flops) for the compu- 
tation of S and of U,S,V are (2mn Z + 2n 3 ) and (4m Z n + 22n 3 ) 
respectively for an mxn matrix (Golub and Van Loan, 1989, 
p.239). 


11.4.5 Application Examples 

The periodic decomposition can be applied both for separa- 
tion of constituent periodic component(s) in a composite 
series as well as f or the modelling and prediction of any 
quasiperiodic series through the periodic component(s). 
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Figure 11.4.1 The composite series <y(k)> to be 
decomposed. 


Example 11.4.5(1) Decomposition of a composite signal 
into periodic components 

Three perfectly periodic components y 17 , y 7 and y 30 of 
periodlengths 17, 7 and 30 respectively are added along with 
white Gaussian noise of signal-to-noise ratio 5 with respect 
to the weakest periodic component y 30 , forming the composite 
signal <y(k)> shown in Fig.11.4.1. The decomposition is 
performed as follows. 

(1) The SVR spectrum is computed for row lengths varying 
between 5 to 50. SVR spectrum (Fig.ll.4.2(a)) shows peaks at 
row lengths of 17 and 34, indicating presence of a component 
with period length 17. (y(k)> is formed into a matrix A of 
row length 17, which is SV-decomposed, and the component 
<y 17 (k)> is extracted using (11.4.2). 

(2) The SVR spectrum of the residual sequence <y(k)-y 17 (k)l 
is computed for row lengths varying between 5 and 50. The 
SVR spectrum (Fig.ll.4.2(b)) shows peaks at row lengths of 
7, 14, 21 etc. So the data (y(k)-y 17 (k)> are formed into a 
matrix of^row length 7, which is SV-decomposed, and the 
component <y 7 (k)> is extracted. 

(3) "j^he SVR spectrum of the subsequent residual sequence 
(y(k)-y 17 (k)-y 7 (k)> is now formed; its SVR spectrum (Fig. 
11.4.2(c)) for row lengths varying between 5 to 70 shows 
peaks at row length 30 and 60; so the data are arranged into 
a matrix of row length 30 and using SVD the periodic 
component of period length 30, (y 30 (k)>, is extracted. 
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Figure 11.4.2 (a) SVR spectrum of the composite signal 

{y(k)> shown in Fig. 11. 4.1, A 

(b) SVR spectrum of the residual sequence <y(k)-y 17 (k)>, 


(c) SVR spectrum of the subsequent residual sequence 
<y(k)-y 17 (k)-y 7 (k}>. 
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Figure 11.4.3 The decomposed periodic components of 
<y(k)}. actual data estimated data 

(a) The actual component <y 17 (k)> and its estimate, 

(b) the actual component <y 7 (k)> and its estimate, 

(c) the actual component <y 30 (k)> and its estimate. 
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The resulting residual sequence <y(k)— y 17 (k)— y 7 (k)— y 3 o(k)> 
is found to be of too small magnitude to be of interest. The 
extracted (estimated) periodic components are shown in 
Fig. 14. 4. 3. 

Remark 

It can be shown (Kanjilal and Palit, 1995) that the periodic 
decomposition is possible even in the case of components 
having nonintegral period lengths. 

Case Study: Modelling and long term prediction of the 
sunspot series 


Decomposition 

The yearly averaged sunspot data for the years 1700-1920 are 
used for modelling. To construct the SVR spectrum, a data 
window size 4 is used; the median of the available values 
for the ratio p (=s 1 /s 2 ) is plotted against varying row 
length. SVR spectrum of the series (Fig.11.4.4) shows the 
first periodic component to be of period length 11, which is 
extracted using (11.4.2). The residual series is further 
analysed by SVR spectrum and a peak is detected at the 
period length of 10; the corresponding periodic component is 
also extracted. SVR spectrum on the subsequent residual 
series does not show any conspicuous peak. The estimated 
periodic components of the sunspot series are shown in 
Fig.11.4.5. 



Figure 11.4.4 SVR spectrum of the sunspot series 
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Figure 11.4.5 (a) The strongest periodic component in 
the sunspot series with period length 11, 

(b) the second periodic component with period length 10. 


Prediction 

For each periodic component given by z 1 v 1 in (11.4.2), 
one period ahead prediction is given by Z( m +i)i v i» where 
z (m+i)i i® the predicted (m+l)th element of the m-vector z l ; 
the elements of Zj (say, <x(k)>) may be modelled as an AR 
process. 

In this particular case, the elements of z 1 correspon- 
ding to the period length 11 are modelled as 

x(k) = -128.659 - 0.542x(k-l) + 0.344x(k-2), 

and the same corresponding to the period length 10 are 
modelled as 
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Figure 11.4.6 The prediction of the sunspot series 
for the period 1921 to 1975. 


x(k) = -24.738 - 0.759x(k-l) + 0.597x(k-2) - 0.129x(k-3). 

In both the cases, the terminating point for the periodic 
components are considered to be the same (which is 1920 
here). One period ahead prediction of both the series are 
computed and are replicated to the year 1970. Fig. 11. 4. 6 
shows the prediction of the series from 1921 to 1970 using 
data up to 1920. The MSE for the actual prediction period 
1921-1931 is 129.28 per sample. 



11.5 CONCLUSIONS 

Modelling of quasiperiodic series in terms of periodic 
components has been presented, a periodic component being 
characterized by a repetitive pattern associated with a 
scaling factor. Three different approaches have been 
considered. The first two approaches are applicable to data 
series which exhibit a certain degree of repetitiveness of 
the pattern. In both the cases the period length is assumed 
to remain unchanged and interpolated data are used within 
the sequence wherever necessary for (near-)alignment of 
consecutive (pseudo-)periods for modelling purposes. The 
third approach involves periodic decomposition of any series 
with or without any conspicuous periodicity. 

In the first approach nonlinear transf ormation and 
linear modelling is used. The quasiperiodic series is first 
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singular value decomposed into a relatively regular part and 
a residual irregular part. The irregular part is nonlinearly 
transformed such that after transf ormation again a 
relatively regular part emerges as dominant and so on. The 
relatively regular parts individually represent nearly 
periodic series. Since SV-decomposition of a periodic series 
produces a pattern component and scaling factors for the 
pattern component to represent the consecutive periods, each 
periodic series is modelled as having a constant pattern and 
the scaling factors are modelled as a linear series. Usually 
the original time domain periodic component is the strongest 
component, and the amount of information which can be 
sensibly modelled from the relatively irregular part is very 
much dependent on the correct choice of the nonlinear 
transformation applied. This is one of the disadvantages of 
this method. 

In the second approach, the scaling factor sequence is 
modelled nonlinearly using a neural network. Here, the main 
emphasis is on better modeiling of the prime relatively 
regular part only. Since nonlinearity is permitted in the 
model of the scaling factor sequence, it is expected that 
the representativeness of modelling will improve in this 
case, which is confirmed by the illustrative examples. 

The modelling of a quasiperiodic process through 
periodic decomposition is by far most straightforward. The 
periodic components can have any period length which is 
determined and need not be assumed. The scaling factors for 
the consecutive periods in case of each component can be 
modelled linearly or nonlinearly. It is expected that since 
the underlying characteristic of the process is modelled 
in terms of constituent periodic components, the quality of 
long term (that is multiperiod in this case) prediction will 
also be good. 

The subject treated in this chapter is an area of 
recent development, and it is expected that f urther work 
will lead to more improved models and prediction results. 
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CHAPTER 12 


PREDICTIVE CONTROL (Part-I): INPUT-OUTPUT MODEL BASED 

Predictive control aims at obtaining the predicted 
performance of the process as specified. 


12.1 INTRODUCTION 

Given any input-output process, a mathematical model of the 
process can be formulated from the available measurements 
and informations, and a suitable control input to be applied 
to the process can be computed such that a specified 
performance criterion is satisfied. Appropriate control 
input is expected to result in the process output reaching 
the desired set point. Process control can be difficult when 
the process itself, as represented by the model, is 
difficult to control, or when the information available to 
the controller is imprecise or incorrect. Some features 
which make a real-life process difficult to control are as 
f ollows: 

(a) the order of the process may not be known, 

(b) the process may be dynamic but the operational data 
available on the input and output may not be rich and may 
not reveal the salient process characteristics, 

(c) the time delay between the input and the output may not 
be fixed or known, 

(d) the process may be open-loop unstable, 

(e) the process may be nonminimum-phase in nature. 

The first three features make the correct identification and 
parameter estimation of the process difficult; the last 
three f eatures demand particular attention in the choice of 
the cost criterion which the control law must optimize. In 
addition there may be unknown disturbances acting on the 
process. A successful control strategy has to perform 
sensibly irrespective of the inaccuracies, inconsistencies 
and difficulties stated above. 

The design of the controller is influenced by the 
process model available but the quality of control largely 
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depends on the performance criterion optimized. 

The perf ormance criteria may concern the desired 

closed-loop controller characteristics, as in case of the 
pole-placement controller, or it may concern a cost 
criterion (or cost-function) to be minimized. The cost 

criterion minimized can be based either 

(a) on the past performance, or 

(b) on the predictive performance. 

The traditional PI (i.e. proportional plus integral) control 
belongs to the former category while the predictive control 
belongs to the latter; the predictive control performs 

better particularly f or processes with time delay because 
the future dynamics of the process can be more correctly 
defined in this case. Again the cost on predictive 

performance can be based either 

(a) on a single step cost function, or 

(b) on a multistep cost function, 

where the single or multistep ref ers to the number of 

time steps over which the cost function is optimized. The 
conventional self -tuning control (STC) belongs to the former 
category; the latter is the class of controllers, known as 

the Long-range predictive controllers. ‘Long-range* refers 
to the prediction horizon being extended over multiple steps 
into the future; the control law takes into account the 
implication of the control action over this predictive 

horizon. The attractive f eature of long range predictive 

control (LRPC) methods is that they possess better stability 
and robustness properties than the non-LRPC methods, and can 
perform satisfactorily irrespective of wrong prior 
assumptions about the process and its environment. Different 
designs are possible for predictive controllers; these are 

characterized by the process model, the prediction scheme, 
and the cost function optimized. 

This chapter describes some of the popularly used 

predictive control methods for linear systems. A real-life 
process is usually dynamic in nature, and it works in a 

stochastic environment; so it is necessary for the 

controller to be adaptive. Sec. 12. 2 introduces STC, which is 
one of the widely used adaptive control methods; STC 
automatically tunes its parameters in a stochastic 
environment to obtain desired performance of the closed loop 
system. The cost minimized is a single-step function of the 
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deviation of the predicted output from the desired value at 
a specific point in future, which is usually the process 
time-delay ahead (the time-delay being the time between the 
application of an input and its response in the output). In 
practice, STC can provide stable control provided certain 
prior assumptions about the process remain valid. However, 
improper choices of time-delay or model order (which are 
quite likely in complex dynamical processes) can easily 
destabilize STC. LRPC schemes can be designed to provide 
satisfactory control under the same circumstances. 

The most important feature of an LRPC is the minimi- 
zation of a multistep cost-function, which is a quadratic 
function of the deviation of the predicted output from the 
desired output over multiple steps (or a horizon) in future. 
In this chapter three popular classes of LRPC algorithms are 
studied, namely the pulse-response model based LRPC, the 
step-response model based LRPC and the Generalized 
Predictive Control algorithm which is CARMA or CARIMA model 
based LRPC. The different LRPC methods have certain generic 
structural features which are introduced in Sec. 12. 3. The 
main difference between the different designs of LRPCs is 
due to the process models used. The algorithmic details of 
the three different LRPC designs are presented in Sec. 12. 4, 
Sec. 12. 5 and Sec. 12. 6. The performance of the LRPC schemes 
largely depend on the choice of the different design 
parameters in the cost-function optimized; this subject is 
probed in Sec. 12. 7. Again, success of a predictive control 
method is largely dependent on numerically and 
computationally robust implementation, which is discussed in 
Sec. 12. 8. 

This chapter is supported by two Appendices. Appendix 
12A presents a brief introduction to the area of systems and 
controls. Appendix 12B studies the Smith predictor, which is 
one of the early approaches to predictive control in a 
deterministic environment. 


12.2 SELF-TUNING CONTROL 
12.2.1 Basic Concepts 

An adaptive control scheme can be broadly considered as a 
combination of an on-line estimation method for the process 
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parameters, and a controller design procedure. There are two 
widely used configurations of adaptive controllers: 

(1) Self-organizing adaptive controllers recursively iden- 
tify the process, and formulate the control strategy aiming 
at optimal performance. 

(2) Model reference adaptive controllers try to achieve a 
closed-loop system perf ormance similar to that of a 
reference model by recursive adaptation of the controller 
parameters. 

The self -organizing adaptive controllers may be categorized 
into different configurations based on the performance 
evaluation technique and the controller design procedure: 

(a) Dual controllers are those that perform the dual 
simultaneous f unctions of realizing the desired perf ormance 
and reducing the model uncertainty. These are designed on 
the basis of available measurements as well as the future 
observation programme and the associated statistics. 

(b) Non-dual controllers are based on the present and the 
past information only. As a result, the rate at which the 
uncertainty about random process variables is reduced, is 
independent of the control action. Non-dual controllers may 
be further classified into certainty equivalent and cautious 
controllers. 

(i) Certainty equivalent controllers ignore the fact that 
the estimated parameters, which are used to design the 
controller, are not the true ones. On the other hand, 

(ii) Cautious control implies modification of the control 
action in recognition of the uncertainty associated with the 
estimated parameters. 

Self-tuning control (STC), which is one of the most widely 
used adaptive control strategies, belongs to a class of non- 
dual, certainty equivalent type self -organizing controllers. 

STC is applicable to processes with constant or slowly 
varying parameters. A basic schematic diagram of the 
application of STC is shown in Fig. 12. 2.1. At every sampling 
instant, the process parameters are estimated, using a 
recursive estimation method, and the estimated parameter 
values are used to compute the control law, which optimizes 
a prespecified cost criterion. Thus the uncertainty in the 
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Se l f -tuning Contro Her 



Figure 12.2.1 Self -tuning control scheme 


estimated parameters is ignored and the control algorithm 
uses them as if they are true. Thus the parameter estimation 
and the design of the control law emerge as two separate 
activities. The idea of separating the parameter estimation 
from the controller design is known as the separation 
principle. Although the algorithm starts with limited 
knowledge of the process, in the course of the recursions 
with the progression of time, the process parameters tend to 
converge to the true values, and the controller approaches 
an optimal state. This conforms to the self -tuning property, 
which implies eventual convergence of the control law to one 
that could be designed, if the true process model were known. 

The concept of self-tuning dates back to Kalman (1958), 
who estimated the plant parameters from an on-line least 
squares algorithm and the estimates were used to derive a 
dead-beat control law at every sampling instant. The 
self-tuning principle was later revived by Peterka (1970), 
who incorporated stochastic features in the formulation. The 
subsequent reported works of Astrom and Wittenmark (1973), 
and Clarke __and Gawthrop (1975), boosted research and 
practical applications of self-tuning controllers. 


12.2.2 Control Algorithms 
Process model 

Consider an input-output model of the process in a 
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stochastic environment 

A(q _1 )y(k) = B(q _1 )u(k-d) + <c(k), (12.2.1) 

where 

A(q _1 ) « 1 + ajq" 1 + ... + a,^'", 

B(q _1 ) = b 0 + bjq" 1 + ... + b n q' n , 

d is the maximum time-delay between the control input u and 
the measured value of process output y; <c is the noise or 
uncertainty in the model. There can be different 
representations of oc. as follows. 

(a) <c(k) ** e(k), (12.2.2) 

where <e(k)> is an uncorrelated random sequence. 

(b) <c(k) = C(q -1 )e(k), (12.2.3) 

where C(q -1 ) is assumed to be a stable but unknown 
polynomial, 

C(q -1 ) = 1 + c t q _1 + ... + c n q~". 

(c) <c(k) as T(q _1 )e(k), (12.2.4) 

where T(q -1 ) is a discrete-time noise observer polynomial: 

T(q _1 ) = 1 + t iq -1 + ... + t r q _1 . 

(d) <c(k) - T(q _1 )e(k)/A, (12.2.5) 

where A = l-q~ , the unit difference operator. 

Note that (12.2.2) leads to a controlled AR model, (12.2.3) 
or (12.2.4) leads to a CARMA model, and (12.2.5) leads to a 
CARIMA model of the process. 

Cost criterion 

The minimized cost function is given by 

J = £(((w(k)-y(k+d)) z + MQu(k)) z )|k> (12.2.6) 

where w is the target set-point f or the output to reach, A 
is a scalar cost factor; Q, which may be a transfer func- 
tion, is typically chosen as, Q = A = 1-q" . The objective 
is to determine u(k) which minimizes the cost J. 
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Remarks 

(a) When w(k) is constant, the controller is called the 
regulator. 

(b) When the costing on the control (A or Q) in J in 
(12.2.6) is too large, the control will resemble open-loop 
control, and when the costing on the control is zero, the 
control becomes the minimum variance control. 

(c) The cost function (12.2.6) is expressed as ensemble 
average because the process model is stochastic in nature. 
When the parameters of the process model remain unchanged, 
the cost criterion J yields the same optimal strategy as the 
one minimizing J' : 

J' = E <(w(k) - y(k+i)) 2 + A(Qu) Z ». 

N i=i 

Control Law 

Assume the noise structure (12.2.5). Introduce the identity 

T(q -1 ) = E(q -1 )A(q -1 )A + (q' d )F(q _1 ), (12.2.7) 

where the degree of E, 5E = d-1 and SF<S(AA), 

E(q _1 ) = 1 + eie -1 + ... + e d _ 1 q" d+1 , (12.2.8) 

F(q -1 ) - f D + f^" 1 + ... + f n _ 2 q' n+2 . (12.2.9) 

From (12.2.1) and (12.2.5), dropping the index (q X ) for 
simplicity, 

EAAy(k+d) = EBAu(k) + ETe(k+d). 

Using (12.2.8-12.2.9), 

Ty(k+d) = Fy(k) + GAu(k) + ETe(k+d), (12.2.10) 

where 

G(q -1 ) = Efq^jBtq" 1 ) 

-1 -n-d+l 

= go + giq + ••• + gn+d-iq 

Since E(q X ) is of order (d-1), Ee(k+d) will have terms 
e(k+d),..., e(k+l) etc. which will be independent of y and u 
terms in Fy(k) and GAu(k). Hence the d-step ahead predictor 
is given by 

y(k+d|k) = Fy f (k) + GAu f (k), 


( 12 . 2 . 11 ) 
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where y f and u f are the filtered signals: 

y f (k) = y(k)/T(q _1 ), u f (k) = u(k)/T(q -1 ), 

and F and G are estimated polynomials. 

The prediction error is given by 

y(k+d|k) = Ee(k+d). 

Since 

dy(k+d|k) 
du(k) “ g °' 


the cost in (12.2.6) is minimized for 
(y(k+d|k) - w(k))g 0 + AQu(k) = 0. 
The control law is obtained as 


u(k) = 


w(k) - Fy f (k) - 
go + 


^ - g 0 Au(k) + g D u(k-l) 

XQ 

go 


( 12 . 2 . 12 ) 


(12.2.13) 


Characteristic Features 

(a) Closed-loop properties 

Equation (12.2.13) may also be expressed as 

w(k) - £y(k) 

u(k) = — „ , Q' - ^ . (12.2.14) 

S^+Q' go 

T 

The closed-loop equation obtained by substituting for u(k) 
from (12.2.14) in the process model is given by 

H fo + 

y(k) “ WA w(k - dl * l ' b '^T q-A- • (12 215) 

Thus even if the process is a nonminimum-phase process (i.e. 
B(q~ 1 ) having roots outside the unit circle), with suitable 
costing Q' on the control, the closed-loop roots will be 
stable. Q' can shift the closed-loop roots from those of 
B(q~ ) (i.e. for Q' being low) towards those of A(q ) (i.e. 
for Q' being high). Thus if the open-loop process is stable 
(i.e. A(q -1 ) being stable), the closed-loop process will 

also be stable. Again, improper choice of the time-delay d 
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may also produce b 0 as low or zero, where appropriate 
control costing can ensure stable control. 

The present control algorithm is not applicable for 
processes which are both nonminimum-phase as well as 
open-loop unstable. 

(b) Model reference features 

Model reference features may be introduced into the self- 
tuning control algorithm by defining an auxiliary output 0 
as 

-i -i P " {q_1) 

0(k) = P(q 1 )y(k), P(q *) = -, 

PdU ) 

and devising a predictive control law u(k) such that 0(k+d) 
reaches w(k), which implies y(k+d) reaching w(k)/P(q _1 ), the 
filtered set point; in other words the output follows the 

model M(q _1 ) = 1/P(q _1 ). The choice of P having a steady- 
state value of unity: P(q~ 1 ) | q=1 = 1, ensures offset-free 

control. Typically P(q _1 ) = (1— 0.6q _1 )/0. 4. 

Remarks 

(a) Control weighting : The control weighting in the cost 

function (12.2.6) can serve two basic purposes: (i) to 
produce stable control for nonminimum-phase plants and (ii) 
to generate moderate or acceptable (also called detuned) 

control action. For offset-free control, Q(q -1 ) should be a 
transfer function with zero steady-state gain, the simplest 
choice being Q(q -1 ) = 1-q” . 

(b) Noise observer polynomial T(q -1 ): In most practical 

applications a first order T(q~ ) suffices. 1/T(q ) 

effectively acts as a low-pass filter for the data. Since 
the time differencing A, acts as a high-pass filter, 
A/T(q ) effectively acts as a band-pass filter on the data 
(e.g., in (12.2.17)). 

Implementation aspects 

The two broad approaches f or the design of the self -tuning 
controller are: (i) the explicit or indirect method and 
(ii) the implicit or direct method. These methods differ in 
the way the controller parameters are estimated. The imple- 
mentation requires T(q -1 ), Q(q -1 ), X and w(k) to be known. A 
summary of the explicit and the implicit methods follows. 
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Explicit approach 

(1) The parameters of the process model (12.2.1) are esti- 
mated using a recursive i parameter estimator. 

(2) The parameters of E(q” ) and F(q~ 1 ) are computed using 
(12.2.7). 

(3) The control u(k) is determined using (12.2.13). 

Implicit approach 

The controller parameters can be directly estimated from 
(12.2.10) but the input data in (12.2.10) are incremental 
whereas the output data are positional; this imbalance may 
affect parameter estimation. To alleviate this problem 
introduce the identity: 

F(q -1 ) - T(q -1 ) + AF'fq" 1 ). (12.2.16) 

So (12.2.10) can now be expressed as 

y(M)-,(k) „ + GAulk) , M 

(12.2.17) 

The implicit method can proceed as follows: 

(1) The controller parameters are directly estimated from 
(12.2.10) using a recursive parameter estimation method. 

(2) The control u(k) is determined using (12.2.13). 

Practical aspects 

(a) Robust control requires robust estimation. If the data 
are not rich but rather steady, the estimator becomes prone 
to blow-up of the covariance matrix. In such cases it is 
necessary to bypass the parameter estimation stage. For 
further discussions in robust parameter estimation see 
Sec.3.4.2. 

(b) The control law discussed assumes a linear process. In 
practice, nonlinearity may be present in various forms. For 
example in the case of an industrial process, the actuator, 
which is used to implement a change in the control input, 
may have nonlinear characteristics; such types of 
nonlinearities should be taken into account separately. It 
is necessary to ensure that the control action proposed is 
actually exerted. 

Remark : These practical aspects concern all adaptive 

controllers. 
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Example 12.2.2 Compute self-tuning control law for the 
process given by 

(1 - 0.8q _1 )y(k) = 0.2u(k-2) + (l-0.6q _1 )e(k), 

minimizing the cost given by (12.2.6) with Q = 1, and given 
values of A and w(k). 


The process model used is 

A(q -1 )y(k) = B(q _1 )u(k-d) + C(q _1 )e(k). 


Following (12.2.7-12.2.13), the key equations for the 
present problem are as follows (the polynomial index (q 7 
is dropped for simplicity). 

(i) C = EA + q" d F, T(q -1 ) = C(q -1 ), 

with E = 1 + ejq -1 , 6F<SA or F = f 0 . 

(ii) Cy(k+d) = Fy(k) + EBu(k) + ECe(k+d). 

(iii) y(k+d|k) = Fy(k)/T + Gu(k)/T, G = EB. 

(iv) = -(w(k) - y(k+d|k))g 0 + Au(k) = 0 

= -w(k) - Fy f (k) + [g+— ] u f (k), y f =y/T, u f =u/T. 


w(k) - Fy f (k) - (G-g 0 )u f (k) - — (T-l)u f (k) 

(v) u f (k) = p-y — r £2 . 

f g 0 + 


In the present case 

A(q -1 ) = 1-0. 8q -1 , B(q _1 ) = 0.2, C(q -1 ) = (l-0.6q _1 ), 

and d = 2. E and F are solved using the identity 

(l-0.6q -1 ) = (l+e iq _1 )(l-0.8q _1 ) + q" 2 f 0 . 

So, = 0.2, f„ = 0.16, and 

G(q -1 ) =_(l+0.2g -1 )0.2 = 0.?t0.04q _1 . 

Hence 

(l-0.6q -1 )y(k+2) = 0.16y(k) + (0.2+0.04q _1 )u(k) 

+ (l+0.2q -1 )(l-0.6q -1 )e(k+2). 

That is 

y(k+2) = I0.16y f (k) + (0.2+0.04q" 1 )u f (k)] + (l+0.2q _1 )e(k+2), 
where y f (k) = y(k)/(l-0.6q -1 ) and u f (k) = u(k)/(l-0.6q _1 ); 
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note that the noise term outside the parenthesis is 
uncorrelated to the data at time k. Hence 

w(k) - 0.16y f (k) - 0. 04u f { k-1) + f^5(0.6u r (k-l)) 

U ' (k) O-- (k 7 0.2) ~ • 

from which the desired control u(k) can be computed. 


12.2.3 Controller as Operator-guide: An Example 

Many industrial processes work on manual control or fixed 
parameter control like the PID control. Plant managers not 
being f ully conversant with the capabilities of adaptive 
control schemes, are often reluctant to permit trials of 
adaptive controllers in their plants. One possible compro- 
mise is to go for adaptive prediction, and off-line control 
strategy, where on-line plant data are used, and the control 
is computed but is not applied directly to the plant; the 
control is used as an operator-guide. An industrial 
application of such a scheme is presented here. 

Strand speed control in iron-ore sintering 

One important part of iron-ore sintermaking is the on-strand 
process (see Sec. 5. 6.1), where sintering actually takes 
place. Proper sintering requires the strand to be driven at 
a certain speed, such that the on-strand process of 
sintering is complete. Too high a strand speed causes 
generation of weak sinter and a large amount of return-fines 
which have to be recycled; too low a strand speed results in 
a drop in' production and uneconomically high sinter 
strength. The trials reported here were conducted at the 
Redcar Sintering Plant of British Steel Corporation in 1982. 

At the time of the trial, the conventional strand-speed 
control system was not working, and a manual control was in 
use, with the operator making occasional changes in the 
strand speed with the aim of achieving a reasonably steady 
waste gas temperature (WG Temp). Here the strand speed is 
treated as the input and the WG Temp is treated as the 
output. 

The objective of the plant trial was 

(a) to produce multistep prediction of WG Temp, and 

(b) to present to the operator the computed value of the 
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Figure 12.2.2 The sinter strand control scheme with 
adaptive predictor/controller as operator guide. 


predictive control to be treated as the desired strand speed 
(see Fig. 12. 2. 2). 

Since frequent changes in strand speed are undesirable 
due to the integrated nature of the process (see Fig.5.6.1), 
the operators were advised by the managenent to change the 
strand speed according to the predictive control advice, 
every 1/2 hr, if necessary. 

An ARMA model of the process was used; the time delay 
between the input (strand speed) and the output (WG Temp) 
was found to be 8 min. from the recorded historical data. 
The sampling time was chosen as 2 min. A control strategy of 
the generalized minimum variance category was used to 
compute the control law. 

At each sampling time, the 8 min ahead prediction of 
the WG Temp was produced as shown in Fig.l2.2.3(a). The 
corresponding (implementable) speed control advice and the 
actual strand speed profile are shown in Fig.l2.2.3(b). It 
is seen that the WG Temp remained close to the desired range 
of 145°-150°, except when the control advice had been 
completely ignored, e.g. around 19 hrs and around 23 hrs. 

Remark 

Although the controller has been used only as an operator 
guide, it should be possible to use the controller directly 
on an on-line basis. 
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Figure 12.2.3 (a) 8-minute ahead prediction of the 

WG temperature in the sintering process (plant trial), 
(b) The strand speed advice and the actual strand 
speed corresponding to WG temperature shown in (a). 


12.3 LONG RANGE PREDICTIVE CONTROL 
12.3.1 Introduction 

The long range predictive control (LRPC) methods are 
characterized by the process model used and the multistep 
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cost function optimized. LRPC broadly falls into three cate- 
gories: (i) pulse response model based, (ii) step response 
model based, and (iii) the Generalized Predictive Control 
class which are CARIMA model based. In each of these 
categories different designs exist, many of which have been 
successfully used in practice. 

IDCOM (IDentification COMmand) due to Richalet et al 
(1978) is pulse response model based; the Model Algorithmic 
Control (Rouhani and Mehra, 1982) also belongs to this cate- 
gory. The step response model based design is due to Cutler 
and Ramaker (1980), who called it Dynamic Matrix Control 
(DMC). Clarke and Zhang (1987) modified IDCOM and DMC by 
incorporating an integrating noise structure in the process 
model for natural elimination of offsets; this model 
structure is used in the present study. The Generalized 
Predictive Control (GPC) due to Clarke et al (1987) unifies 
the different design features for LRPC into one algorithm. 
The interest in LRPC is mainly because most LRPC designs can 
produce stable and robust control irrespective of the 
difficult real-life problems of nonminimum-phasedness, 
varying time-delay, unknown model order etc. 

In self -tuning control discussed in the last section, 
the control designed at every sampling instant is aimed at 
driving the output to the desired set-point at a specific 
time step in future. On the other hand, in LRPC, the control 
computed at every sampling instant is intended to maintain 
the output at the desired set-point over certain multiple 
time steps (or a horizon) in future. Instead of a single- 
step cost function as in STC, LRPC is based on the minimi- 
zation of a multistep cost function. The ultimate result is 
a comparatively superior control perf ormance of LRPC in 
terms of robustness and stability. The structural diff erence 
between STC and LRPC schemes is shown in Fig. 12. 3.1. 


12.3.2 The Generic Structure 

LRPCs are characterized by 

(a) the process model used, and 

(b) the multistep cost function minimized. 

The specification of the cost function includes 
(a) the horizon in the future over which the cost function 
is minimized for control calculations, 
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Figure 12.3.1 Comparison between the control law 
computation schemes of self-tuning control and LRPC. 

(b) the type of control increments penalized and the 
cost on the control increments, and 

(c) the future set-point sequence etc. 

Fig. 12.3.2 shows the typical movements of the output in 
response to a certain control input sequence, given the 
desired set-point sequence. 

Cost function 

The prime objective is to minimize the squared error e 
between the predicted output (y) and the set-point (w) over 
a specified horizon in future with minimum control effort. 
The generic cost function is given by 

J = e T c + A[Au] T [Au], (12.3.1) 

where e is the predicted error vector, and Au is the 
incremental control input: 

Au = tAu(k) Au(k+1) ...Au(k+N)] T , 

c = [e(k+l) e(k+2) ... e(k+N)] T ; 
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Figure 12.3.2 Typical profiles of the output (y), the 
set-point (w) and the control input (u); y K is output 
response due to unchanged control input u(k-l), y B is 
additional response due to additional control input S u , 
w* is low-pass filtered set-point trajectory. 

e(k+N) = w(k+N) - y(k+N(k), N being the length of the 
predictive horizon. 

The control law is computed through the following 
steps: modelling and parameter estimation, prediction of the 
output, and computation of the control u(k); the latter two 
steps are summarized below, while the parameter estimation 
is treated in Sec. 12. 3. 2. 

Output prediction 

The two basic aspects of output prediction that concern LRPC 
designs are the horizon over which prediction is performed, 
and the known and the unknown components of the predicted 
output. 

The output y(k) is predicted over the specified horizon 
(k+N t ) to (k+N 2 ). Ideally N t = d, the time delay, but in the 
present discussions a jiefault value of N t = 1 is used. 

The prediction <y(k+i|k), i=l to N> has two components, 
as follows: 
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y = [y(k+l | k) y(k+2|k) ... y(k+N|k)] T 

= [y A (k+l | k) y A (k+2 1 k) ... y(k+N|k)] T + 

ty B (k+l|k) y B (k+2|k) ... y(k+N|k)] T 

= y A + y B (say), (12.3.2) 

where y A is the known, and y B is the unknown component at 
time k, the elements of y B being functions of unknown 
control components: Au(k), Au(k+1),... etc. y B is of the 
generic form 

So n rAu(k) 

A gi go U Au(k+ 1 ) 

y B = gau = : : • . : , ( 12 . 3 . 3 ) 

.gN-1 gN-2 • • • goj [_Au(k+N-l)_ 

where the elements of G are functions of the estimated 
process (or controller) parameters. 

Control law 
Following (12.3.3), 

£ = w - y = w - y A -GAu. (12.3.4) 

Again, following (12.3.1) and (12.3.4), Au minimizing the 
cost function J is given by 

Au = [G T G + AirVlw - y A ], (12.3.5) 

where w specifies the future set points: 

w = [w(k+l) w(k+2) ... w(k+N)] T . 

The objective is to determine u(k), which comes from the 
first element of Au. 

Set point sequence 

Usually the future values of the set point are not known. So 
the set point is assumed to remain constant at w(k) over the 
time (k to (k+N)>. If actual values for w(k+l), w(k+2) etc. 
are known (e.g., in case of robotic movements or in case of 
flights of space crafts), these values may be used. If the 
step changes in the set point (w(k)-w(k-l)) are too large, 
undesirably high values of u may be required to force y to 
reach the set point, which can be avoided by considering a 
low-pass filtered set point trajectory w* (see Fig.12.3.2), 
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w*(k+i) = 


(l-a)w(k+i) 

•l ’ 


1 - aq 

where 0<a<l and w*(k) = w(k). 


12.4 LRPC: PULSE RESPONSE MODEL BASED 


Pulse response model has been discussed in Sec. 2. 4. Consider 
the model 

y(k) = H(q _1 )u(k) + (12.4.1) 

where <e(k)> is uncorrelated random sequence, and (e(k)/A> 
models the nonstationary noise; H(q~ ) is assumed to be 
given by 

H(q -1 ) = l^q -1 + ... + h M q _M . 

The parameters of H(q _1 ) can be estimated using the recur- 
sive least squares method on the incremental model 

Ay(k) = H(q _1 )Au(k) + e(k) 

= hjAufk-l) + h z Au(k-2) +...+ h„Au(k-M) + e(k). 


Output prediction 


As stated in (12.3.2), consider the predicted output over 
the horizon (k+1) to (k+N) being composed of two components: 


A A 

y = y A + 


A 

yB- 


Here (a) y A is output response due to that component of the 
future control input sequences (u(k+i), iaO> which remains 
unchanged at u(k-l), that is with Au(k) = Au(k+1) =•••= 0, 
and (b) y B is the response due to the additional components 
in the control iijput sequences (u(k+i)-u(k-l), i^O). So the 
component vector y A is known but y B is unknown with y B (k)=0. 
The concept of the two components of the predicted output is 
schematically shown in Fig. 12. 3. 2. 


(i) Prediction y A 

Following (12.3.9) prediction based on the estimated 
parameters is given by 
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Ay A (k+i) = y A (k+i|k) - y A (k+i~l|k) 

- h T Au A (k+i-l), i = 1,2,...,N, (12.4.2) 

where h = [hj h 2 ... h M ] , and 
Au A (k+i-l) 

= [Au(k+i-l),Au(k+i-2),. . ,Au(k),Au(k-l),..,Au(k+i-N)] T . 

1 terms = 0 ( by hypothesis) 

(ii) Prediction y B 

Ay B (k+i) ■ h T Au 2 (k+i-l) 

= h T [Au(k+i-l),Au(k+i-2) Au(k), 

Au(k-l),..., Au(k+i-N)] T . 

( N- l ) terms = 0 

Since y B (k) = 0, 

Ay B (k+i) = Ay B (k+i) - y B (k) 

■ Ay B (k+i) + Ay B (k+i-l) +. . .+ Ay B (k+l) 

* h T ((u(k+i-l)-u(k-l)), (u(k+i-2)-u(k-l)), 

..., (u(k)-u(k-l)), 0, ...,0] T 

= h [5uj, fiUj.j 5 Uj_jj +1 ] , (12.4.3) 

where 5uj = u(k+i-l) - u(k-l). 

Example: 

A^ B (k+2) = Ay B ( k+2) + Ay B (k+l) 

= y B (k+2) - y B (k+l) + y B (k+l) - y B (k) 

= h!(u(k+l) - u(k)) + h 2 (u(k) - u(k-l)) 

+ h x (u(k) - u(k-l)) + 0 
= hitutk+l) - u(k-l)) + h 2 (u(k) - u(k-l)). 

□□ 

Hence, the prediction error vector is given by 

A A 

£ ■ w — y A - y B , 


that is 
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e(k+l)" 


w(k+l)' 


v A (k+l)' 

y A (k + 2) 


e(k+2) 


w(k+2) 



e(k+p) 


w(k+p) 


y A (k+p) 



1 


p-i • 


0 

. .h 



5ui 


Su 2 


Su p 


= w - y x - H[<5 u b ]. 


(12.4.4) 


Control Algorithm 
Consider the cost function 

J = e T e + A[<5 u b ] T [5u b ], (12.4.5) 

where A is a positive scalar weight. This cost function is 
minimized for the control law 

Su B = [H T H + AI] _1 H T [w - y t ]. (12.4.6) 

The desired control is given by 

u(k) = u(k-l) + Su lf (12.4.7) 

where 3uj is the first element of [Su B ], computed from 
(12.4.4). 


12.5 LRPC: STEP RESPONSE MODEL BASED 

Consider the step response model of the process 

y(k) = S(q _1 )u(k) + e(k)/A (12.5.1) 

where (e(k)> is uncorrelated random sequence, and (e(k)/A> 
models the nonstationary noise; S(q ) is given by 

S(q *) = Sjq -1 + ... + s H q -M . (12.5.2) 

Output prediction 

The problem in using (12.5.1) directly for prediction is 
the nonconvergence of the {s^ sequence. Instead consider 
the y B components in (12.4.2) which can be expressed in 
terms of the Si parameters of the step response model as 
f ollows. 

Consider a special case: N = 3, for which 
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2 N=3 
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h 2 h! 
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using (12.3.11). Therefore, y B in (12.4.2) can be expressed 
as 



Sl 


Au(k) 

H[6u 2 1 = 

s 2 Sj 0 


Au(k+ 1 ) 


, S M S H-1- * - s l. 


Au(k+N-1) 


= S[Au]. 


(12.5.3) 


Hence, following (12.4.3), the prediction error vector e can 
be restated as 

c = w - y A - S[Au]. (12.5.4) 

Control law 

Consider the cost function 

J = e T c + 0[Au] T [Au], (12.5.5) 


where 0 is the scalar cost on the control increments. This 
cost function is minimized for the control law 

Au = [S T S + 0I]" 1 S T [w - yj. 

The desired control is given by 
u(k) = u(k-l) + Au(k), 


(12.5.6) 
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where Au(k) is the first element of the vector Au in 
(12.5.6). 


12.6 GENERALIZED PREDICTIVE CONTROL 

Consider a CARIMA model of the process: 

A(q _1 )y(k) = B(q -1 )u(k-l) + C(q -1 )^^, (12.6.1) 

where (e(k)> is uncorrelated random sequence, and (e(k)/A) 
models the nonstationary noise; the time delay between the 
input and the output is assumed to be at least 1. 

Output prediction 

The prediction of the output when the process is represented 
by a CARIMA (or ARIMAX) model is discussed in Sec.5.2.1; 
some basic steps for design are restated here. 

Consider the explicit prediction approach. The p-step 
ahead prediction is given by 

y(k+p|k) = F p (q -1 )y(k) + G p (q -1 )Au(k+p-l), 

where 

G p (q _1 ) = Epfq'^Bfq' 1 ) = g D + g^" 1 + g 2 q’ 2 + ...etc., 

and the parameters of Ep and F p are obtained f rom the 
identity 

C(q -1 ) = E p (q” 1 )AA(q 1 ) + q‘ P F p (q _1 ); (12.6.2) 

the degree of Ep, 5Ep = p-1, and 5F p s 5A-1. 

The parameters of A(q 1 ) and B(q 1 ) are estimated from the 
process model (12.6.1.). For each value of p, the first p-1 
parameters of Ep are the same as the p-1 parameters of 
Ep_j; the p-th parameter of Ep and the parameters of F p are 
computed from (12.6.2). Again, the first p parameters of G p : 

g Q , Si gp_i, will remain unchanged as p is incremented. 

The recursive estimation of the parameters of Ep and F p are 
discussed in Appendix 5A. 

The multistep prediction is given by 

y(k+i|k) = f lG y(k) + f u y(k-l) +... 

+ g 0 Au(k+i-l) + g!Au(k+i-2) + ... 
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f le etc. are the parameters of F p obtained from i-th 
solution of the identity (12.6.2). 

In (12.6.1), if the time delay is d instead of 1, the 
multistep prediction will be given by 

y(k+i|k) = f lc y(k) + f tl y(k-l) +... 

+ g 0 Au(k+i-d) + g 1 Au(k+i-d-l) + ... 

Example: If i = 5, 

y(k+5|k) = (f 5o y(k) + f 51 y(k-l) +...) 

+ g 0 Au(k+4) + g x Au(k+3) + g 2 Au(k+2) + g 3 Au(k+l) + g 4 Au(k) 

+ (g 5 Au(k-l) + g 6 Au(k-2)), 

where the terms within the parantheses are known and the 
rest are unknown. Thus 

y(k+i|k) = y A (k+i|k) + y B (k+i|k), 

where y B component of the predicted output y is a function 
of Au(k), Au(k+1), ... etc., which are unknown, whereas y A 
component is known. So the output vector y can be predicted 
as 


A 

y = 

y(k+l|k)' 
y(k+2 jk) 


go 

gl go 

0 


'Au(k) 

Au(k+1 ) 


y(k+N | k) 


jBb- 1 gN-2 

go. 


Au(k+N-1) 


= y A + G[Au] (say), (12.6.3) 

where y A and G[Au] represent all the known and the unknown 
quantities respectively. The prediction error vector c is 
given by 

e=w-y = w-y A - G[ Au] . (12.6.4) 

Control law 

The control law is derived based on the minimization of the 
cost f unction 

N t +N n u 

j = V [w(k+i) - y(k+i)] Z + I X(i)[Au(k+i-l)] z , (12.6.5a) 

l=Nj i=l 
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where 

N t = start of the output horizon, 

N = length of the output horizon, 

N u = length of the control input horizon, 

A(i) = is the positive control costing factor. 

In the basic form, if Nj = 1, N„ = N, and Mi) = A, using 
(12.6.4) the cost function (12.6.5a) can be restated as 

J = e T e + A[Au] T [Au]. (12.6.5b) 

The control vector minimizing J is given by 

Au = [G T G + AI] _1 G T (w - y A ). (12.6.6) 

Hence the optimal control law, 

u(k) = u(k-l) + Au(k), (12.6.7) 

Au(k) being the first element of Au(k) in (12.6.6). 

Example 12.6 Compute the generalized predictive control law 
for the process given by 

(l-3q -1 +2q“ Z )y(k) = (0.5-0.8q _1 )u(k-l) + 

minimizing (12.6.5a), assuming X = 0.5, Nj = 1, N u = N = 3, 
and the desired set point being known. 

Here, 

AA(q -1 ) = (l-q~ 1 )(l-3q 1 +2q~ 2 ) = l-4q' 1 +5q“ Z +2q' 3 . 

B(q *7 = 0.5 - 0.8q \ and d = 1. 

The parameters of Ep(q *) and F p (q *) for p = 1 to 3 are 
computed using (12.6.2): 

1 = Epfq’^AAfq" 1 ) + q' p F(q -1 ). 

For p = 1 , 

Ep(q _1 ) = 1, and F p (q -1 ) = foi+fuq'Vf^q' 2 ; 
so 

1 = ( l-4q -1 +5q" Z -2q -3 ) + q'Voa+fnq'Vf^q’ 2 ). 

Hence ^01 = ^*11 = ""5, fj 2 = 2. 

For p = 2 , 

Ep(q _1 ) = 1+ejq" 1 , and F p (q _1 ) = f o2 +f i 2 q" 1+ f 22 q" 2 ; 
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So 

1 = (l + e 1 q _1 )(l“4q~ 1 +5q" Z -2q -3 ) + q" 2 (f o2 +f 12 q' 1 +f 22 q" 2 ). 
Hence e t = 4, f 02 = 11, f 12 = -18, f 22 = 8. 

For p = 3 , 

Ep(q _1 ) = l+e 1 q' 1 +e 2 q" 2 , and F p (q _1 ) = foa+fiaQ" ^^cf 2 ; 
So, 

1 = ( l~ei q" l +e 2 q" Z )( l-4q -1 +5q ” Z -2q " 3 ) + 

Hence, e t = 4, e 2 = 11, f 03 = 26, f 13 = -47, f 23 = 22. 

Since Gp(q _1 ) » Eptq'^Btq" 1 ), 

G^q" 1 ) = 0.5-0.8q _1 , 

G 2 (q _1 ) - (l+4q _1 )(0.5-0.8q 1 ) = (O.S+l^q^-S^q -1 ), 
G 3 (q _1 ) = (l+4q' 1 +llq" 2 )(0.5-0.8q _1 ) 

= 0.5+1. 2q -1 +2.3q -2 -8. 8q -3 . 


Following (12.6.3), 
^(k+1 1 k) * 4y(k) 

y(k+2|k) = lly(k) 

^(k+3 1 k) = 26y(k) 
That is 


- 5y(k-l) + 2y(k-2) - 0.8Au(k-l) 

+ 0.5Au(k), 

- 18y(k— 1) + 8y(k-2) - 3.2Au(k-l) 

+ 1.2Au(k) + 0.5Au(k+l), 

- 47y(k-l) + 22y(k-2) - 8.8Au(k-l) 

+ 2.3Au(k) + 1.2Au(k+l) + 0.5Au(k+2). 


y(k+l|k)' 


y(k+2 1 k) 

s 

y(k+3 1 k) 



4-5 2 
11 -18 8 
26 -47 22 



■ y(k) ' 


'0.5 


Au(k) 


y(k-l) 

+ 

1.2 0.5 


Au(k+1) 


y(k-2) 


2.3 1.2 0.5 


Au(k+2) 


Au(k-l) 





= y A + GAu. 


Thus y x and G being known, using known values for w and X, 
Au given by (12.6.6) can be computed, of which Au(k) is of 
interest (see Sec. 12. 8 for discussions on implementation 
aspects). 
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12.7 LRPC: DESIGN CONSIDERATIONS 

The pulse response model based LRPC, the step response model 
based LRPC and GPC have been shown to have the same generic 
structure as regards the cost-f unction minimized and the 
control law generated. The difference in the performance of 
these LRPC schemes is due to (a) the different process 
models used, and (b) the different cost-functions optimized. 

The f our f actors which characterize the LRPCs with 
respect to the cost-function (12.6.5a) are 

(1) the starting horizon (N^ 

(2) the length of the horizon (N) 

(3) the length of the control horizon (N u ) and 

(4) the scalar cost on the control increments (X). 

From the process characteristics point of view, whether the 
controller can handle nonminimum-phase processes, open-loop 
unstable processes or variable time-delay processes, is of 
interest, and is discussed in this section. A 

nonminimum-phase process has a G which is rank deficient. 
Correct choice of control-cost X and time-delay d is usually 
difficult, and it is important that the controller performs 
well even with crude choices, typically with X = 0 and 

d = 1. 

Starting horizon 

Ideally the starting horizon N t should be equal to the 
maximum time-delay between the input and output. For all 
three models (12.4.1), (12.5.1) and (12.6.1), a default 

value of N t = 1 has been used. The time-delay should 
preferably be overestimated. An underestimated time-delay 
(i.e. N x >d) leads to a nonminimum-phase representation of 
the process even though the process may be actually minimum 
phase. Algebraically, if N^d, first d-N t rows of matrix G 
in (12.3.3) can be zero, so G*G will be noninvertible, and a 
nonzero X will be required to ensure invertibility of 
[G G + XI]. Alternatively, a method which can be applied to 
invert even rank-deficient matrices has to be used (see for 
example SVD based implementation, discussed in Sec.12.8). 

If specifying the time-delay d is difficult, it may be 
assumed that d = N t = 1, and the LRPCs will still be able to 
produce satisfactory control. 
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Length of the horizon 

One of the main features of LRPCs is that a horizon N 01) 
is considered over which the predicted output should agree 
with the set point, whereas in the case of STC, N = 1. For a 
nonminimum-phase process, the step response of which has an 
initial negative going response (for example see Fig. 12A.2 
in Appendix 12A), the horizon should be long enough to 
include the positive-going response; otherwise the control 
will not be stable. In general, the horizon should 
preferably be long enough to correspond to the rise time of 
the process. 

Length of the control horizon and the scalar cost 

The length of the control horizon N u defines the length of 
the vector Au in (12.3.3). For a finite N u , an infinite 
scalar cost X is assumed for the control increments beyond 
the control horizon N u ; in other words, 

Au(k+j-l) - 0, j > N u . 

That is, beyond N„, the control is assumed to remain 
unchanged at u(k+N u ) value. 

Remark: For LRPCs based on pulse response models, since 
control increments with respect to u(k-l) are considered, 
for a finite N u , 5u(k+j-l) = 0, (j>N u ) and hence the control 
remains unchanged at u(k-l) value from (k+N„) onwards. 

□□ 

Consider, the different possible values for N u as follows. 

(a) If N u - 1, Au(k+i) = 0 for i&l. At (k+1), (k+2) 

etc., since the control remains unchanged at u(k), the 
process may be considered to be running open-loop. Now, if 
X = 0, the control will fail for an open-loop unstable pro- 
cess. However, open-loop stable, nonminimum-phase processes 
can be controlled with N u = 1 and X = 0, if the horizon N is 
sufficiently long as discussed earlier. The main attraction 
is that for N„ - 1. G is an N-vector, and the control 
computation (12.3.5) becomes scalar computation. 

(b) If N u = N, G is an NxN matrix; G has to be a full rank 
matrix, for [G T G + XI]" 1 to exist with X = 0, otherwise 
[G t G + XI]” 1 has to be computed using SVD. 

(c) If 1<N U <N, G will be an NxN u rectangular matrix. N u can 
be chosen equal to the rank of G, to ensure invertibility of 
G T G. For example, if N = 4 and N u = 3, 
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G = 


go o 
gl go 

g2 gl go 
g3 g2 gl. 


even if g Q = 0, G will be full rank and G T G will be 
invertible. Thus N u should be at least equal to the number 
of unstable or nearly unstable or oscillatory poles of the 
controller, for stable control performance even with A = 0. 
From a computation point of view, small N u is preferable. 


Remarks 

(a) Usual settings 

The usual setting is N t = maximum possible time delay and 
N u = 1. N should be at least equal to the model order and 
cover the rise time of the process; typically N = 10. 
Complex processes require N U >1. 

(b) Process models 

(i) The required model order for CARIMA representation is 
much less than that for a pulse response model or a 
step response model, which is the main advantage of 
using GPC. 

(ii) All the LRPC schemes discussed in this chapter have 
integrating noise structure, which ensures disturbance 
rejection with zero offset. 

(c) Alternative cost-functions 

Since LRPC involves optimization of a finite-horizon 
cost-function (as against infinite stage cost minimization 
in the case of the LQ control discussed in the next 
chapter), constrained cost-functions or even nonquadratic 
cost-functions may be accommodated. 

(d) Comparative features of IDCOM, MAC and DMC methods 
The conventional IDCOM and MAC control strategies which are 
pulse response model based, and use N t = 1, N u = N and 
A = 0. Since G G has to be invertible for stable control, 
nonminimum-phase processes or processes with varying time 
delay cannot be controlled. The conventional DMC, which is 
step response model based, use N x = 1, whereas N, N u , A can 
be chosen as desired. So with proper design, nonminimum- 
phase processes or processes with varying time-delay can be 
controlled. 
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12.8 IMPLEMENTATION ASPECTS OF LRPC 

The parameter estimation aspects have been discussed in 
Chapter 3. The computation of the predictive control law 
involves two main problems: 

(i) multistep prediction of the output over the horizon of 
interest, and 

(ii) computation of the matrix inverse of the generic form 
of [G G + XI] . 

Multistep prediction of output has been discussed in Chapter 
5. The present discussions are confined to the latter 
matrix inversion problem. Two popular methods are discussed. 

Implementation using singular value decomposition 

Matrix inversions, using the singular value decomposition 
(SVD), studied in Sec.3.3.2, will be followed here. 

Consider the SVD of the mxn matrix G: G = U£V T , where 
U and V are orthogonal matrices, and £ is a diagonal matrix: 
£ = diag [<r t <r 2 ... <r p :0] , where the singular values 0 * 
etc. appear along the diagonal in nonincreasing order: 
<r x * <r 2 fc ... o-pfcO. Hence 

G T G + XI - V£U T U£V T + VXIV T 

= V[£ 2 + XI]V T . (12.8.1) 

So 

[G T G + XI]' 1 = V[£ 2 + XI] _1 V , 
leading to 

u c = [G T G + XI] _1 G T [w-y A ] = V[[£ 2 + XI] _1 £]U T [w-y A ]. 

( 12 . 8 . 2 ) 

Here, [£ 2 + XI] *£ is a diagonal matrix of which 

(Ti 

the i-th diagonal element = — . 

<r 2 + X 

The singular values which are insignificantly small may be 
eliminated (through truncation of the diagonal matrix 
[(£ Z + XI) _1 £]) before computing the control law in (12.8.2). 
Since the first element in u c is to be calculated to 
determine u(k), only the first row of V will be used in the 
computation using (12.8.2). 
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Implementation using UD factorization 

T 

UD factorization of a square matrix P is given by: P = UDU , 
U being an upper triangular and D being a diagonal matrix. 
The present matrix inversion problem has some structural 
similarity with the recursive least squares estimation 
problem for which UD factorization is popularly used (see 
Sec.3.4.2 and Appendix 3A). U should not be confused with 
the orthogonal matrix U obtained through SVD, which is 
different. 

Consider am upper triangular matrix G: 


8° Q 


r Tn 
r l 

T 

Si So 


r 2 


= 

T 

S2 Si So 


1*3 



T 

83 82 gl. 


L r 4 


(say). 


Hence 

[G T G + AI]" 1 = 




Referring to the parameter estimation 
covariance update is given by 

P -1 (k) = P -1 (k-l) + x(k)x T (k), 


(12.8.3) 

problem, the 

(12.8.4) 


where x is the new data (column) vector received. Using the 
similarity between (12.8.3) and (12.8.4), amd defining 
P = [GG+XI], (12.8.3) cam be computed recursively as 
f ollows: 

(i) Initialize P _1 (k-1) = XI, that is 

P(k-l) ■ I/X = U(k-l)D(k-l)U T (k-l). 


So U(k-l) = I, and D(k-l) = I/X. 


Let j = 1. 

(ii) Define x(k) = rj 

(iii) Use UD measurement update routine to compute U(k) and 
D(k) where 

P(k) = U(k)D(k)U T (k). 

(iv) Increment j and go to (ii) until the computation of 
(12.8.3) is complete, which is given by the final P. If G 
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is an mxn matrix (m>n), m calls to the UD measurement 
routine will be necessary. 

Remarks 

(a) UD factorization based implementation can be computed 
recursively, and is computationally more efficient than the 
SVD based approach, which cannot be computed recursively. 

(b) SVD based implementation is numerically more robust. 
The noninvertibility of G, with or without nonzero A, can be 
easily handled with appropriate truncation of U, I, and V 
matrices. The rank of G obtained through the singular values 
can also be used as a diagnostic feature; the rank 
deficiency of G indicates the number of unstable zeros in 
the process and the desired length of the control horizon. 


12.9 CONCLUSIONS 

A brief outline of some of the popular predictive control 
methods has been presented. All the methods studied have 
been reported to have been successful in real-life 
applications. 

The self -tuning control (STC) and the long range 
predictive control (LRPC) methods basically involve two 
stages: parameter estimation and design of the control law; 
in both these stages different choices are possible, which 
ascribe different properties to the control algorithms. In 
STC, the costing (A) of the control increments protects the 
controller from instability in nonminimum-phase processes, 
although the optimum choice is not easy. The two 
prime stabilizing f eatures of LRPC methods are the costing 
of the future control increments and optimization of a 
multistage cost-function over a predictive horizon. These 
controllers can handle nonminimum-phase processes with 
relative ease (even with vanishingly small A), and in 
general possess better stability and robustness properties 
than STC. Although the controller design part is on stronger 
grounds, more work needs to be done to make the parameter 
estimator equally robust. 

Out of the different available designs of LRPCs, it is 
the generalized predictive control (GPC) strategy, which is 
of particular interest, as most other LRPC designs can be 
shown to be subsets of GPC. 

For successful implementation, it is usually necessary 
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to install safety constraints on the estimated parameter 
values and also on the control increments based on the prior 
knowledge of the system. The inherent nonlinearity of the 
associated equipments (like the control valve etc.) also 
need to be taken into consideration. 
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CHAPTER 13 


PREDICTIVE CONTROL (Part-II): STATE-SPACE MODEL BASED 


Predictive controllers can be designed in the 
state-space framework with specified end-point 
reference conditions. 


13.1 INTRODUCTION 

Long range predictive control (LRPC) methods formulated 
using transfer function models were discussed in Chapter 12. 
This chapter is devoted to the study of state-space 
formulation of linear quadratic (LQ) controllers, which form 
another popular class of predictive controllers. Here, the 
process is represented by a linear state-space model, and 
the cost criterion is a quadratic function of the states and 
the control inputs. If the disturbances to the process (as 
expressed by the model), are Gaussian in nature, the LQ 
control is referred to as the linear quadratic Gaussian 
(LQG) control. 

The following comparative features of state-space LQ 
control and LRPC may be noted. 

(a) Both the classes of control consider a linear model of 
the process, and both minimize scalar quadratic cost- 
functions to compute the control law u(k). 

(b) The LRPC methods use the present time k as the terminal 
reference point, and the control u(k) is determined based on 
multistep predictions of the output over a finite horizon in 
the forward direction into the future. On the other hand, 
the state-space based LQ control considers a time-point at 
the end of a horizon (k+N) as the terminal point; the 
control u(k) can be computed through backward recursion of 
the control algorithm to the present time, satisfying the 
specified terminal conditions. N, the length of the horizon, 
can be finite or infinite. 

(c) Under certain conditions, the LQ control in state-space 
formulation and the LRPC produce identical results. 

Discussions on state-space formulation of LQ control is 
widely covered in many papers and texts; this chapter 
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13.2 LQ Control of a Deterministic Process 367 


provides a brief exposure to the basic concepts, results and 
the implementation aspects. There are two basic issues in 
the design of LQ controllers: (i) whether the process is 
modelled as a deterministic or a stochastic process, and 
(ii) whether the parameters of the process model are time 
invariant and known or are time varying and unknown. The 
present study covers the design of LQ control both f or 
deterministic and stochastic processes; both time invariant 
and time varying processes are considered. 

The study of LQ control of a time invariant determinis- 
tic process is presented in Sec.13.2. The control law u(k) 
has two elements: u(k) = -k T (k)x(k), where k is the control 
gain and x is the estimated state. In the case of a 
stochastic process, the LQ controller may be designed by 
invoking the separation theorem, which permits the state 
estimation and the controller design to be considered as two 
separate problems. The separation theorem is introduced in 
Sec. 13. 3, and the design of the stochastic LQ controller as 
an optimal state estimator associated with an optimal 
deterministic controller is discussed. Sec. 13.4 demonstrates 
that the algebraic steps f or the solution of the 
deterministic LQ control problem and the state estimation 
problem are dual of each other. 

For time varying stochastic processes, the process 
parameters are estimated from an input-output model, and the 
estimated parameters are used for state estimation, which is 
discussed in Sec. 13.5. The computation of the deterministic 
control law and the associated implementation aspects are 
considered in Sec. 13.6. 

This chapter is supported by three appendices. Deriva- 
tion of the LQ control law for a multivariable process is 
detailed in Appendix 13A (note that the studies presented in 
this chapter consider a single-input single-output process). 
Appendix 13B discusses the transmittance matrix formulation 
and its implementation, which concern the estimation of the 
state in a stochastic -environment. Appendix 13C' presents the 
UD time-update algorithm and its implementation which is 
used in the computation of the control law. 


13.2 LQ CONTROL OF A DETERMINISTIC PROCESS 

Consider a single-input (u) and a single-output (y) process 
represented by the deterministic model 
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x(k+l) = Ax(k) + bu(k), (13.2.1) 

y(k) = c T x(k), (13.2.2) 

where x is an n-state-vector, y is the measured output and u 
is the deterministic control input; A is an nxn matrix, and 
b and c are n-vectors. The initial state x(0) is assumed to 
be known. Since the model is deterministic, it is understood 
that there is no uncertainty (in terms of process noise or 

measurement noise etc. ) present, and complete knowledge of 
the states is available. 

The objective is to produce the optimal control 

decisions u(k), u(k+l),..., u(k+N-l), so as to minimize the 
scalar cost function 

t k+N- 1 T o 

J = x T (k+N)Q„x(k+N) + £ (x T (i)Qx(i) + Au (i)), 

l=k 

(13.2.3) 

where Qm and Q are symmetric positive semidefinite matri- 
ces, and A is a positive scalar cost; A may be conditionally 
zero or vanishingly small as discussed later. 

There are various methods f or solving the present 

optimal control problem (Strejc, 1981). One of the popular 

approaches is the method of dynamic programming, which is 
based on the principle of optimality. 

Remark: The principle of optimality, (Bellman, 1957) states 
that an optimal control sequence has the property that 
whatever the initial state and the initial control are, the 
remaining controls must constitute an optimal sequence with 
regard to the state resulting from the initial control. 

□□ 

The detailed solution of the optimal control problem using 
dynamic programming appears in Appendix 13A; the main 
results are presented here. 

Let the cost at the last stage of the horizon {k,k+N> be 
defined as 

J H = x T (k+N)P(k+N)x(k+N), 

where P(k+N) = Q„. It can be shown that starting from the 
prespecified terminal condition P(k+N) = Q N , the optimal 
control law can be computed through backward recursion from 
one stage to the next, with each stage having identical 
structure, until the present stage is reached. At each stage 
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Process 



Figure 13.2.1 Deterministic state-space model of the 
process with optimal LQ control. k(k) is the Kalman 
gain or the feedback gain. 

i, (ksi<k+N) the gain vector k(i) is computed using P(i+1), 
and the control u(k) is determined; P(i) is next computed to 
be used in the next stage i-1 until i=k. 

Summing up, the general solution of the deterministic 


optimal control problem is given by 

k T (k) = (A + b T P(k+l)b) -1 b T P(k+l)A, (13.2.4) 

P*(k) = P(k+1) - P(k+l)b(A + b T P(k+l)b)'Vp(k+l), (13.2.5) 
P(k) = Q + AP*(k)A, (13.2.6) 

u(k) = - k T (k)x(k), (13.2.7) 


where P(k+N) = Q H . The schematic diagram of the deter- 
ministic process with LQ control u is shown in Fig. 13. 2.1. 

Remarks 

(1) A=0 may be permitted in (13.2.3) only if b T P(k+l)b in 
(13.2.4) is ensured to be nonzero, however if the implemen- 
tation involves explicit inversion of A, a positive A will 
be required. 

(2) The first term in (13.2.3) penalizes the terminal 
deviation of the state from zero set-point. This is why the 
present control problem is called a linear regulator prob- 
lem. Non-zero set-points can be accommodated by expressing 
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the measurement equation in terms of e(k) instead of y(k): 

e(k) = y(k) - w(k) = c T x(k), 
w being the set-point; for further details see Sec. 13.5. 


13.3 SEPARATION THEOREM AND CONTROL OF 
A STOCHASTIC PROCESS 

In the last section, optimal control law was derived for a 
linear deterministic model; the process was assumed to be 
free from unknown disturbances, and complete knowledge of 
the states was assumed to be available. In practice, the 
process may not be exactly linear, nor are the states likely 
to be exactly known; there may be noise associated with the 
measurements, and unknown disturbances may be acting on the 
process. Even if the process is time invariant, due to the 
various uncertainties present, the variables x and y which 
characterize the process become stochastic variables, and 
the consequent model is called a stochastic model. 


13.3.1 The Control Problem 
Process model 

Let the process be represented by the model 

x(k+l) = Ax(k) + bu(k) + Vj(k), (13.3.1) 

y(k) = c T x(k) + v 2 (k), (13.3.2) 

where the process noise (v^k)} and the measurement noise 
(v 2 (k)> are assumed to be zero-mean Gaussian white sequences 
with known statistics. The initial state x(0) is a zero-mean 
Gaussian random vector with positive semidefinite covariance 
matrix P(0). Usually x(0), <v 1 (k)) and <v 2 (k)> are assumed 
to be mutually independent, which is not a limitation. 

Control objective 

The objective is to compute the control sequence u(k), 
u(k+l) u(k+N-l) so as to minimize the scalar cost 

•*» k+N- l*r o 

J = E(x T (k+N)Q ft x(k+N) + £ (x (i)Qx(i) + Au (i))>, 

l-k 


(13.3.3) 
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Figure 13.3.1 LQ control of a stochastic process using 
the separation theorem. The state estimator and the 
deterministic controller are treated as separate 
problems. 

where Qn and Q are positive semidefinite symmetric matrices, 
and A is a positive scalar constant. 


13.3.2 Separation Theorem and Controller Synthesis 


According to the separation theorem, the present optimi- 
zation problem (concerning the determination of the optimal 
control sequences for the stochastic process) has two 
separate components, which can be individually optimised 
(see Fig.13.3.1): 

(i) the deterministic optimal control problem, which is 
solved for the optimal LQ controller gain k(k) (13.2.4), and 

(ii) the stochastic state estimation problem which is solved 
for the optimal state estimate x(k|k). 

The desired optimal solution is given by 

u(k) = -k T (k)x(k|k). 
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Features consequent to separation theorem 

(a) The LQ control (feedback)) gain k(k) is independent of 
the statistical characteristics of the process model 
(13.3.1 - 13.3.2), i.e. P(0) and the noise statistics. k(k) 
is also independent of the observed data y(k). 

(b) The optimal state (x(kjk)) estimation is independent of 
the optimal control problem, and hence independent of the 
weighting matrices Q N , Q and the scalar X in the cost 
function (13.3.3). The past control inputs u(k-l), u(k-2) 
etc. are considered deterministic data. The state may be 
estimated using the Kalman filter, discussed in Sec. 6. 6. 

(c) Computation of k(k) involves backward iteration from 
the reference terminal point, whereas computation of x(k|k) 
involves forward iteration starting from the initial state. 


13.4 DUALITY BETWEEN LQ CONTROL AND 
STATE ESTIMATOR 

The optimal deterministic LQ control system and the optimal 
state estimator are structurally identical (Kalman, 1960); 
this relationship can be shown as follows. 

State estimation by Kalman filter 

Let a single-input single-output process be expressed as 

x(k+l) = Ax(k) + bu(k) + v 1 (k), (13.4.1a) 

y(k) = c T x(k) + v 2 (k); (13.4.2a) 

the noise sequences (v t (k)> and <v 2 (k)> are assumed to be 
zero-mean, Gaussian white, with the covariances: 

£{v 1 (k)vi(k)> = Rl £{v|(k)> = R 2 , 

and the initial state x(0) is assumed to be a zero-mean 
Gaussian random vector, with positive semidefinite covari- 
ance matrix P(0). 

The Kalman filter estimate of the state is given by 

x(k|k) = x(kjk-l) + k r (y(k) - c T x(k|k-l)), (13.4.3a) 

where the optimal filter gain k f (k) is obtained from the 
recursive solution of the Riccati equation as 
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k f (k) = P(k | k-l)c(c T P(k | k-l)c + R 2 f\ (13.4.4a) 

P*(k|k) = P(k | k-1) - P(k | k-l)c(c T P(k | k-l)c+R 2 )" x c T P(k | k-1), 

(13.4.5a) 

P(k+1 1 k)= AP*(k|k)A T + R lt (13.4.6a) 

starting with the specified initial conditions: 
x(0|-l) = x(0) and P(Oj-l) = R 0 . 

LQ control of a deterministic process 
Consider the deterministic model 


x(k+l) = Ax(k) + bu(k), (13.4.1b) 

y(k) = c T x(k), (13.4.2b) 

the optimal control minimizing the LQ cost function (13.2.3) 
is given by 

u(k) = -k T (k)x(k), (13.4.3b) 

where the Kalman control gain k(k) is obtained f rom the 
recursive solution of the Riccati equation as follows: 

k T (k) » (A + b T P(k+l)b) _1 b T P(k+l)A, (13.4.4b) 

P*(k) = P(k+1) - P(k+l)b(A + b T P(k+l)b) _1 b T P(k+l), (13.4.5b) 

P(k) = Q + AP*(k)A, (13.4.6b) 

starting with the specified terminal condition P(k+N) = Q„. 

Remarks 

(1) Note that the sets of Equations (13.4.4a - 13.4.6a) 
and (13.4.4b - 13.4.6b) are structurally dual of each 
other. 


(2) In terms of implementation, the duality between the 
optimal filter and the LQ controller permits the use of the 
same measurement update routine for solving (13.4.4a, 
13.4.5a) and— (13.4.4b,- 13-.-4.-5b)' for the filter* -and -for the 
controller respectively. Similarly the same time-update 
routine can be used for solving equations (13.4.6a) and 
(13.4.6b) for the filter and the controller respectively. 
The measurement update routine can also be used for the 
recursive least squares estimation of the parameters in time 
varying processes, as discussed in the next section. The 
implementation aspects are treated in detail in Sec.13.7. 

(3) The duality between the optimal state estimator and the 
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Ak f ( k) 


— x(k+l I k ) 


© 


q“ 1 i 


x(k 


A k 


k— 1 ) 


-c T 


4 > 


y(k) 


(a) Estimation loop 



(b) Control loop 


Figure 13.4.1 Duality between the estimation loop and 
the control loop. The duality is with respect to the 
determination of the filter gain k f and the controller 
gain k. 


optimal deterministic LQ controller, shown in Fig. 13. 4.1, 
can be summarized (in terms of correspondence of the 
respective terms and features) as follows. 


State estimator 

PC k|k ) 

P(k+1 |k) 
c 

A 

r 2 

r i 

kji 

Backward iteration used 
to compute P(k) following 
the last stage: P(k+1) . 


LQ controller 

P*(k) 

P(k) 

b 


X 

Q 

k T 

’ Forward i teration used to 
compute P(k) following 
the last stage: P(k-l). 
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13.5 LQ CONTROL OF TIME VARYING PROCESSES 

So far, the parameters of the state-space model have been 
assumed to be time invariant and known. In real-life most 
processes have time-varying unknown characteristics. So, 
with the progression of time, it is necessary to reestimate 
the parameters of the process model using the available 
measurements and information. 

LQ control involves perf orming the f ollowing operations 
at every time step. 

(1) Recursive estimation of the parameters of the input- 
output or transfer-function rjaodel. 

(2) Estimation of the state x(k|k) using the estimated 
process parameters. 

(3) Computation of the control law minimizing the specified 
cost function. 

The LQ control problem is broadly specified in terms of the 
process model and the cost function to be minimized. In this 
section, the formulation of the LQ control problem in 
state-space format using the input-output data is presented, 
and the significance of the cost f unction minimized is also 
analysed. The problems of state estimation and the 
computation of control law are considered in the subsequent 
sections. 


13.5.1 Process Models 
CARMA model 

Consider a single-input single-output process given by 

A(q _1 )y(k) = B(q -1 )u(k-d) + C(q _1 )e(k), (13.5.1) 

where 

A(q ) * 1 + a a q + a 2 q + ... + a„q , 

B(q -1 ) = b 0 + bjq'V b 2 q" Z + ... + b n q" n , 

C(q _1 ) = 1 + ^q' 1 *- C 2 q -Z + ... + c n q' n ; 

d is the time delay between the input u and the output y, e 
represents the noise and the model uncertainties. C(q j is 
assumed to be a stable polynomial (see Sec. 2. 4. 2). 

The equivalent state-space representation in observable 
canonical form is given by 
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x(k+l) = Ax(k) + bu(k) + se(k), (13.5.2) 

y(k) = c T x(k) + e(k), (13.5.3) 

where 



b = [0 ... 0 b„ bj ... b n ] T , c - [1 0 ... 0] T , 

s = [(C!— a t ) (c 2 -a 2 ) ... (c n -an) 0 ... 0] T . (13.5.4) 

Note that (i) (d— 1) leading elements of b erne zeros, and 
(ii) the size of x = max (degree A(q -1 ), degree B(q - ) + d, 
degree C(q -1 )). 

CARIMA model 

In case of the CARIMA representation of the process: 

A(q _1 )y(k) * B(q -1 )u(k-d) + C(q _1 )e(k)/A, (13.5.5) 

that is 

A(q _1 )y(k) = B(q _1 )Au(k-d) + C(q _1 )e(k), (13.5.6) 

where 

A(q -1 ) = AA( q _1 ) = 1 + a^q' 1 +...+ anq' n + an+iq" <n+1> ; 
the corresponding state-space model is given by 

x(k+l) = Ax(k) + bAu(k) + se(k), (13.5.7) 

y(k) = c T x(k) + e(k), (13.5.8) 

where 



s = [(c! — &j) (c 2 -a 2 )...(c n -an) (-a„ +1 )... 0) T , 


(13.5.9) 
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b and c being the same as in (13.5.4). 

CARIMA model with nonzero set point 

To incorporate nonzero set points (w(k)) in the N-stage 
minimization problem of the LQ controller design, it is 
necessary to know the future set points. If unknown, the 
f uture set points may be assumed to be the same as the 
present set point. 

Introduce the output error 

e(k) = y(k) - w(k). 

Hence 

A(q _1 )e(k) = B(q -1 )Au(k-d) - A(q _1 )Aw(k) + C(q -1 )e(k). 

(13.5.10) 

So the state-space model becomes 

x(k+l) = Ax(k) + bAu(k) + wAw(k+l) + se(k), (13.5.11) 

e(k) = c T x(k) + e(k), (13.5.12) 

where 

w = [-1 ^ -a 2 ...-a/, 

A, b, s and c being the same as in (13.5.7 - 13.5.8). 


13.5.2 Cost Functions 

For input-output models, considering a predictive horizon of 
length N, the LQ cost function is given by 

o k+N-l~ o 

J, = y (k+N) + t (y z (i) + Xu (i)), (13.5.13a) 

l=k 

where the set-point is zero; in case of nonzero set-points 
(w), the equivalent cost function is given by 

k+K-1 

J 2 - (y(k+N)-w(k+N)) z + £ ((y(i)-w(i)) 2 + Au z (i)). 

1 =k 

(13.5.13b) 

The problem with Jj or J 2 is that, if the underlying process 
does not contain an integrator, a zero steady state offset 
between the output and the set point will require a nonzero 
control; this value of control may not correspond to the 
minimum cost (J a or J 2 ), even in the absence of 
disturbances. 

The remedy is to consider a cost function where 
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deviation of the control signal from a steady-state mean 
value is penalized, or instead of the actual control signal, 
increments of the control signal are penalized as follows: 

J 3 = (y(k+N)-w(k+N)) z + £ ((y(i)-w(i)) 2 + A(Au(i)) 2 ). 

1 =k 

(13.5.14) 

If the measurement equation is given by 
y(k) = c T x(k) + e(k) 

as in (13.5.8), the cost functions J t may be approximated as 
J, ' = £(x T (k+N)cc T x(k+N) + £ (x T (i)cc T x(i) + X(u(i)) 2 )>. 

l=k 

(13.5.15) 

Similarly, if the measurement equation is given by 
e(k) = c T x(k) + e(k), 

as in (13.5.12), the cost function J 3 can be approximated to 
J 3 ' = £(x T (k+N)cc T x(k+N) + £ (x T (i)cc T x(i)+X(Au(i)) 2 )}. 

1 =k 

(13.5.16) 

With respect to the state-space representation (13.5.2 - 
13.5.3) of the CARMA model, the LQ control law minimizing 
J t ' is given by 

u(k) - -k T (k)x(k|k). (13.5.17) 

Similarly, with respect to the state-space representation 
(13.5.11 - 13.5.12) for the CARIMA model with nonzero set- 
point, the LQ control law minimizing J 3 ' is given by 

Au(k) = -k T (k)x(k|k); (13.5.18) 

the estimation of the state x(k|k) is discussed in Sec. 
13.6, and computation of the gain k(k) is discussed in 
Sec. 13.7. 

Remark : J^and J 3 ' will be equivalent to Jj and J 3 respec- 
tively for deterministic cases, or if the sequence (e(k)) is 
white. 


13.6 ESTIMATION OF THE STATE x(k|k) 

The state can be estimated using the Kalman filter, 
(discussed in Sec.6.6). Alternatively, transmittance matri- 
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ces may be used for estimation; this approach is followed in 
the present discussion. 

The objective is to estimate the state x(k|k), given 
the process model (13.5.2 - 13.5.3) or (13.5.7 - 13.5.8). 
Note that in both cases, the process noise and the 
noise are the same. In the estimation of 
process noise can be eliminated ijji terms of 
put sind the output signals. x(k|k-l) is 
measurement-updated to x(k|k) using a steady-state Kalman 
filter. Unlike the Kalman filter, the transmittance matrix 
approach does not require the knowledge of the covariances 
of the process noise and the measurement noise. 


measurement 
x(k|k-l), the 
the past ii 


13.6.1 State Estimation from CARMA Model 


The state-space representation for the CARMA model (13.5.1), 

A(q _1 )y(k) = B(q _1 )u(k-d) + C(q' 1 )e(k), 

can be interpreted to be in innovations form: 

x(k+l|k) = Ax(kjk-l) + bu(k) + se(k), (13.6.1) 

y(k) = c T x(k|k~l) + e(k), (13.6.2) 

where 

y(k | k— 1) = c T x(k|k-l), 
and hence 

e(k) = y(k) - y(k | k— 1) 

is referred to as the innovations process. Eliminating the 
noise e(k) from (13.6.1) using (13.6.2), 

x(k+l|k) = [A-sc T ]x(k|k-l) + bu(k) + sy(k). 

T 

Let [A-sc ] = F; so 

x(k|k-l) = [I-q~ 1 F] - 1 [bu(k-l) + sy(k-l)], (13.6.3) 

where following (13.5.4), 


F = [A-sc T ] 



'-n 

0 

0 


0 


1 

0 


(13.6.4) 
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which is a function of C(q *) alone. Since C(q -1 ) is a 
stable polynomial, F is also a stable matrix (i.e. the 

eigenvalues of F will be within the unit circle). 

Equation (13.6.3) is in a computationally inconvenient 
form as it involves polynomial matrix operations. A much 
simpler expression evolves from (13.6.3), if the time shift 

operations in [I-q F] 1 are absorbed in the corresponding 

input and output data vectors u and y respectively as 

follows. Let (13.6.3) be rewritten as 

x(k|k-l) = — - — (M u u(k-1) + M y y(k-1)], (13.6.5) 

C(q _1 ) 

where 

u(k-l) = lu(k-l) u(k-2) ... ] T , 
y(k-l) = [y(k-l) y(k-2) ... ] T , 

and M u and M y , the transmittance matrices of u(k-l) and 
y(k-l) respectively, are given by the expressions 


[I-q - 1 F] ' 1 Ibu(k-l) ] 

1 

[M u u(k-1)], 

and 

C(q *) 


[I-q"V] -1 (sy(k-l)J 

l 

[M y y(k-1)]. 


C(q ) 



The transmittance matrices have simple implementation, as 
detailed in Appendix 13B. 

Note that (13.6.5) is the estimate of x based on the 

available input and output data up to time k-1. At time k, 
using the additional information y(k), the estimate of the 

state may be measurement updated as 

x(k|k) = x(k|k-l) + k f (y(k) - c T x(k|k-l)); (13.6.6) 

k f is the filter gain of the steady-state Kalman filter for 
the state estimator. 

Again using (13.6.6), from (13.6.1 - 13.6.2), 

x(k+l|k) = Ax(kjk) + bu(k) + [s-Ak f Je(k). 

Thus the effect of noise in the progression of the state 
estimate is minimized for 

s = Ak f . 

As A is singular, the solution of this equation is not 

unique. In the present case, one solution which is always 
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valid (Lam, 1982a) is given by 
k f = [1 C[ c 2 .„] . 

Summing up, for a CARMA process, 

x(k|k) = x(k|k-l) + k f (y(k) - c T x(k|k-l)) 

= — - — [MuU(k-l) + M y y(k-1)] + k f e(k) 

C(q -1 ) 

= — - — [M u u(k-1) + M y y (k-1) 

C(q_1) -i -i 

+ k f (A(q jy(k)-B(q ju(k-d))]. 

(13 6 7) 

The LQ control law is given by 
u(k) = -k T (k)x(k|k). 

Remarks 

(a) The optimal control law, u(k) = -k T (k)x(k|k), should 

incorporate t^ie knowledge of the measurement y(k). If 

instead of x(k|k), x(k|k-l) is used, the control with 
incomplete state information will result, since the 

information y(k) is left unutilized. 

(b) If the noise observer polynomial T(q~j is used (as 

discussed in Sec.2.4.2), it simply replaces C(q -1 ) in 
(13.6.4), (13.6.7) etc. 

(c) In the state estimation stage, the division by C(q _1 ) 

need not be explicitly performed, if computing the control 
u(k) is the prime objective. The vector C(q _1 )x(k|k) is 
obtained directly from (13.6.7); the control u(k) is 
computed from 

u(k) = -k T (k)C(q _1 )x(k|k) - (Clq'-llWk). 

The state estimation and the LQ control scheme are shown in 
Fig.13.6.1. 

Example 13.6.1 Determine x(kjk) for the process: 

y(k) - 1.7y(k-l) + 0.72y(k-2) = 0.4u(k-4) + 0.8u(k-5) 

+ e(k) - 0.5e(k-l). 

Here, the size of the state vector (in (13.6.1)) is given by 
max (degree of A(q” ), (degree of B(q _1 )+d), degree of 
C(q -1 )) - max (2,5,1) = 5. Again, 
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u(k-l) y(k-l) y (k) u(k-d) 



u(k) 


Figure 13.6.1 State estimation for CARMA model and the 
computation of the LQ control law u(k). M u and M y are 
the transmittance matrices, k f is the steady state 
Kalman filter gain, k(k) is the control gain. 


b = [0 0 0 0.4 0.8] T , 

e = {(-0.5+1. 7) (-0.72) 0 0 0] T = [1.2 -0.72 0 0 0] T , 
and a x = -1.7 and a 2 = 0.72 in A. Following (13.6.4), 


F = [A-sc] T = 


0.5 1 0 0 O' 

0 0 10 0 
0 0 0 1 0 

0 0 0 0 1 

0 0 0 0 0 


Following (13B.13 in Appendix 13B), 


0 

0 

0 

0.4 

0.8 


' 1.2 

-0.72 0 

0 

o' 

0 

0 

0.4 

0.6 

-0.4 


-0.72 

0.36 0 

0 
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The state estimate is now computed using (13.6.7), where 
k r = [1 -0.5 0 0 0 ] T , and 

e(k) = y(k) - 1.7y(k-l) + 0.72y(k-2) - 0.4u(k-4) - 0.8u(k-5). 
u(k) can now be computed as discussed in Example 13.7.1. 


13.6.2 State Estimation from CARIMA Model 

State estimation for LQ regulator with zero set point 
Following (13.5.7 - 13.5.8) 

x(k+l|k) = Ax(k|k-1) + bu(k) + se(k), (13.6.8) 

y(k) = c T x(k|k-l) + e(k). (13.6.9) 

The state estimate is given by 

x(k|k) = x(k | k— 1) + k f (y(k) - c T x(k|k-l)). 

Eliminating e(k) in (13.6.8) by substituting from (13.6.9) 

x(k+l|k) = (A-sc T ]x(k|k-l) + bAu(k) + sy(k). 

That is 

x(k|k-l) = [I-q -1 F] -1 (bAu(k-l) + sy(k-l)], (13.6.10) 

where F = (A-sc ] is the same as in case of the CARMA model 
(13.6.4), and hence is strictly stable. Introducing the 
transmittance matrices M u and M yI for Au(k-l) and y(k-l) 
respectively, 

x(k|k-l) = [MuA^k-l) + M yl y(k-l)J, 

C(q _1 ) 

where 

Au(k-l) = [Au(k-l) Au(k-2) ... ] T , 
y(k-l) = ly(k-l) y(k-2) ... ] T . 

Hence the desired state estimate is given by 

x(k|k) = -i— [MuAu(k-l) + M yI y(k-l) 

C(q } 

+ k f (A(q 1 )y(k)-B(q m(k-d))]. 

where 

k f = [1 Cj c 2 ... ] T . 
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The LQ control law is given by 

Au(k) = u(k) - u(k-l) = -k T (k)x(k|k). (13.6.11) 

Remarks 

(a) Here, the control law is expressed in terms of Au 
because the control increments are costed in the cost 
function (13.5.16). 

(b) F, M u> y(k-l) and k f are the same for the CARMA and 
the CARIMA models. M yI is different from M y because of s 
appearing in (13.6.10) instead of s in (13.6.3). 


State estimation for LQ controller with nonzero set point 

The state-space model (13.5.11-13.5.12) for the process 
(13.5.10) 

A(q -1 )e(k) = B(q -1 )Au(k-d) - A(q _1 )Aw(k) + C(q _1 )e(k), 
can be expressed in the innovations form: 

x(k+l|k) ■ Ax(k|k-1) + bAu(k) + wAw(k+l)+ se(k), 
e(k) ■ c T x(k|k-l) + e(k); 
the state estimate is given by 

x(k|k) = x(k|k-l) + k f (e(k) - c T x(k|k-l)). 

Eliminating e(k) from the state-space model, 

x(k+l|k) ■ [A -ec T ]x(k | k-1) + bAu(k) + wAw(k+l) + se(k). 
That is 

x(k|k-l) = [I— q -1 F] -1 [bAu(k— 1) + wAw(k) + se(k-l)], 

where F = (A-ec ] is the same as the F in the CARMA case 
(13.6.4), and hence is strictly stable. Introducing the 
transmittance matrices M u , M w , M e for Au(k-l), Aw(k) and 
e(k-l) respectively 


x(klk-l) = — - (MuAufk-l) + M^wlk) + M e e(k-1)], 


C(q 1 

where 

Au(k-l) = [Au(k-l) Au(k-2) ...] T , 

Aw(k) = [Aw(k) Aw(k-l) ... ] T , and 
e(k-l) = [c(k-l) e(k-2) ... ] T . 

Hence the desired state estimate is given by 


(13.6.12) 
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u(k-l) w(k) y(k) u(k-d) 



Figure 13.6.2 Estimation of the state x(k|k) for a 
CARIMA model with nonzero set point w(k). For zero set 
point cases, M e and c(k-l) are replaced with M y and 
y(k-l) respectively. For CARMA models, all (1— q~ ) 
blocks disappear. Division by C(qJ need not be 
explicitly implemented as shown in Fig.13.6.1 


x(k|k) = — - — [M^ulk-l) + M„Aw(k) + M e e(k-1)] + K f e(k) 
C(q~J 

= — — {MyAulk-l) + M*Aw(k) + M 0 e(k-1) 

C( ^ } -x ' 

+ k f ( A(q jy(k)-B(q X )u(k— d))J. 

Fig. 13. 6. 2 summarizes the state estimation scheme. The LQ 
control law is given by (13.6.11). 


13.7 COMPUTATION OF CONTROL 

The stochastic optimal controller comprises a deterministic 
controller followed by a stochastic state estimator as 
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discussed in Sec. 13. 4. 

Restating (13.2.4 - 13.2.6), for the state-space model 
(13.6.2 - 13.6.3), the LQ control gain k(k) is obtained from 
the backward recursive solution of the following equations 

k T (k) = (A + b T P( k+1 )b) -1 b T P(k+l ) A, (13.7.1) 

P*(k) = P(k+1) - P(k+l)b(X+b T P(k+l)b)" 1 b T P(k+l), 

(13.7.2) 

P(k) = Q + A P (k)A, (13.7.3) 

T 

where Q * cc and P(k+N) = Q. In case of models derived from 
CARIMA representation, (i.e. for the model (13.5.7 - 
13.5.8)), A replaces A in (13.7.1) and (13.7.3); all other 
terms remain the same. 

At every sampling time k, the control gain k(k) is 
computed using the covariance matrix P(k+1). The computation 
of P(k+1) involves two main considerations: (i) Control 
horizon, (ii) Implementation using the principle of duality. 

Remark: The computation of k does not depend on whether the 
set point is zero or not, as structurally the cost function 
minimized remains the same (as discussed in Sec.13.5.2). 


13.7.1 Control Horizons 

The control gain k(k) depends on the control horizon 
considered. The different classes of horizons (shown in 
Fig.13.7.1) are as follows. 

(a) Infinite horizon : In this case the terminal covariance 
considered is P(») = Q. At each sampling instant P(k+1) is 
calculated through backward recursion of (13.7.2) and 
(13.7.3) to convergence, starting from the terminal stage 
P(eo). 

(b) Adaptive infinite horizon: Here the recursion (13.7.2 - 
13.7.3) is carried out a finite number of times (usually 
once) starting with the covariance obtained at the last 
sampling instant. 

(c) Receding finite horizon: The terminal covariance 

considered is P(k+N) = Q, and N number of backward 

recursions of (13.7.2 - 13.7.3) are executed to compute 

P(k+1). If the process parameters remain unchanged, the 
resulting control law will be time invariant. 
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(a) 


P(k+1) 


P(c«) - Q 

i « 


< 

(iterations to convergence) 


(b) 


(covariance from last sampling time) 


I ••• horizon receding » 

P(k+1) < (1 or finite iterations) 


(c) 


P( k+N) = Q„ 

< (fixed) »| 


P(k+1) « (N-i terat i ons ) 

horizon receding » 


(d) 


P( k+N) = Q„ 
( f ixed) 


P(k+1) 


F ixed hor i zon 
< ( i terat ions) 


Figure 13.7.1 Different horizons for backward itera- 
tions of covariance matrix for the LQ controller: 

(a) Infinite horizon, (b) Adaptive infinite horizon, 

(c) Receding finite horizon, (d) Fixed horizon. 


(d) Fixed horizon: P(k+N) = Q is considered as the terminal 
covariance and k is assumed to lie within fixed time 
instants k and (k+N). The backward recursion (13.7.2 - 
13.7.3) are performed starting from P(k+N) = Q to P(k+1), 
and the Kalman gain k(i) is calculated at each recursion 
and stored. As time progresses from k to (k+N) the 
corresponding precomputed k(i) are used to calculate the 
control law. 


13.7.2 Implementation based on the Principle of Duality 

The principle of duality between the optimal filter and the 
optimal controller is discussed in Sec. 13. 4. The two stages 
of the Riccati equation (13.7.2) and (13.7.3) are duals of 
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the measurement update and the time update respectively of 
the Kalman filter. Thus computation of the gain k(k) will 
require two subroutine calls: (i) the measurement update 
subroutine (which is also used for the least squares 
parameter estimation), and (ii) the time update subroutine. 
For numerical stability and computational efficiency, U-D 
factorization of the covariance matrix is used. The 
measurement update is discussed in Appendix 3A. 

Bierman (1977, p.124) discusses a general procedure for 
the U-D time-update (13.7.3) based on matrix operations. 
However, a much simpler vector implementation is possible in 
the present case by the ref ormulation of tlje decomposition 
problem as follows. Given the factors, U and D , the 
problem is to compute the updated factors of P in 


T * * * * *T 

POO = Q + A P (k)A, P =U D U , 
where 



o 

o 


Y 

_ T 

0 


0 

Q = cc - 

: 0 


; 


0 


0 


= gqg T (say); 

that is 

g = [1 0 ... Of, and q = 1. 

So 

P = gqg T + A T P*A = [g A T U*] diag[q D*] [g A T U*f, 


= WDW T (say) (13.7.4) 

T 

= UDU . 

— j 

The updated U and D factors of P can be computed from WDW 
using the modified weighted Gram-Schmidt algorithm. In the 
present context, since A is in observable canonical form, W 
has a strictly zero lower triangular part as shown below. 


Example: Let 


“ a l 

-a 2 

-a 3 

* 

1 

U* 2 

U*3 

1 

0 

0 

1 

0 

0 

, and U = 


1 

u 23 

1 


(13.7.5) 
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Hence 


W = A U* = 


1 -a! ( - a 1 u* 2 -a 2 ) (-a 1 u* 3 -a 2 u 2 3 -a 3 ) 

0 1 


0 


U*2 


1 


u 13 


U 


23 


(13.7.6) 


Since W has strictly zero lower part, construction of W and 
subsequent orthogonalization becomes easier with reduced 
computational load. 

U-D time-update, its vector formulation, and FORTRAN 
implementation is given in Appendix 13C. 


Remarks: Equivalence between LQ-Control and GPC 
The deterministic LQ-control with finite horizon (given by 
for example (13.5.14) and (13.5.16)), and the generalized 
predictive control (GPC) discussed in Sec. 12. 6 minimize the 
same cost f unction and produce identical control actions, 
although the LQ-controller is state-space model based and is 
solved through backward recursions whereas GPC is input- 
output model based and is solved through predictions in the 
forward direction. Consequently the stability properties for 
both the control methods are the same. Numerically stable 
implementations are available for both the algorithms. For 
further discussions see Clarke et al (1987). 


Example 13.7.1 Compute the control law minimizing Jj, given 
by (13.5.13a) with A = 0.5 and N = 12, for the process 
considered in Example 13.6.1. 

Two subroutines will be used to calculate k(k): 

(i) U-D measurement update (Appendix 3A) to solve (13.7.2), 

(ii) U-D time update (Appendix 130 to solve (13.7.3). 

Step 1: Initialize with P(k+12) = cc T , where c = [1 0 0 0 0] T . 
That is, initialize with the U-D factors, 

D(k+12) = [1 0 0 0 0] T ; 

all elements of the vector U(k+12) are initialized as zeros. 

Step 2: Starting with U(k+12) and D(k+12), perform U-D 
measurement update to produce U*(k+ll) and D*(k+ll). 

Step 3: Perform U-D time update to produce updated U(k+ll) 
and D(k+ll) vectors. 

Step 4; Iterate through (steps 2 to 3) until U*(k) and D*(k) 
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are produced. 

Remarks : The computation of the gain k(k) in (13.7.1) is not 
performed within the iterative loop. k(k) emerges from the 
last U-D measurement update as k u (k) = P(k+l)b, which is the 
unweighted control gain. The measurement update routine also 
produces the scalar (A + b T P(k+l)b) as ALPHAJ (See Appendix 
3A). So the computation of k(k) proceeds as follows. 

Step 5: Compute k(k) as 
k„(k)A 

k(k) = — , k u (k) = P(k+l)b, 

A + b P(k+l)b 

In the present case, 

k(k) = [1.6757 1.6243 1.5077 1.3039 0.9846] 7 . 

Step 6: The control u(k) = -k (k)x(k|k), is computed from 

C(q -1 )u(k) = - k T (k)[M u u(k-l}+ M y y(k-1) + kfe(k)], 

where all the terms excepting k(k) are obtained as in 
Example 13.6.1. Hence the control law works out as 

u(k) = -0.8636y(k) + 0.2368y(k-l) - 0.8092u(k-l) 

- 0.9916u(k-2) - 1.2935u(k-3) - 0.998 u(k-4); 

Remark: If y(k) and hence e(k) is ignored, 

u(k) * - 0.8413y(k-l) + 0.6218y(k-2) 

- 0.8092u(k-l) - 0.9916u(k-2) - 1.2935u(k-3) 

- 1.3433u(k-4) - 0.6909u(k-5). 


13.7.3 Implementation Aspects and Features 

Some of the main features ascribing numerical stability, 
computationai efficiency and robustness to the present 
algorithm are discussed here. 

Riccati equation 

The solution of the Riccati equations (13.7.2 - 13.7.3) 
using U-D factorization (discussed in Sec.13.7.2) is found 
to be particularly robust against round off errors and other 
instabilities. Although UD 1 corresponds directly to the 
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T 

square root of the covariance matrix (since P = UDU ) and 
hence having all the advantages of square-root propagation, 
the square roots are never explicitly evaluated. 

The U-D factorization for the covariance time-update in 
the present case is particularly short and efficient due to 
vector implementation. 

Control cost factor (X) 

If X, the scalar cost factor on control in (13.5.13- 
13.5.16), is zero and if the leading element of b vector is 
also zero, numerical difficulties in the U-D measurement 
update of the Riccati equation result, because the Agee- 
Turner factorization used (Bierman, 1977, p.78) fails under 
such conditions. However, this problem can be avoided by 
assigning a vanishingly small value to X (for example, a 
number near the floating point zero of the computer). Note 
that with vanishingly small X, the LQ controller can handle 
nonminimum-phase processes, unlike the generalized minimum 
variance controller, also minimizing the cost (12.2.6). 

The value of X can have a wide range. Larger X results 
in slower (or more sluggish) control response. 

Implementational simplicity 

(i) The implementation does not require explicit construc- 
tion of the matrix A (or A for the CARIMA model), the 
parameters a t (or a t ) can be used directly. Similarly P* and 
P matrices also need not be formed explicitly; only their 
factorized U-D components expressed in vectors are used. 

(ii) If the solution of the Riccati equation requires more 
than one recursion, only the covariances are updated, i.e. 
cycling through equations (13.7.2) and (13.7.3) inside the 
recursive loop without calculating the Kalman control 
gain. On exit, the gain, which is available as a by-product 
of the covariance update, appears as P(k+l)b from which the 
final T control gain (13.5.36) is calculated, where 
(X+b P(k+l)b) is also available from the covariance 
update. 

Mismodelled process dynamics 

Due to over-parameterization (which may even be due to fast 
sampling), there may be unstable common (or nearly common) 
factors between the estimated A(q” ) and B(q -1 ) polynomials. 
These mismodelled dynamics may show up in unboundedness of 
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the diagonal elements of the control covariances. This can 
be arrested by using hard constrains on the D-f actors and 
the algorithm cam still be operational. 

Computation time 

Unlike algorithms involving solution of the Diophantine 
equations or spectral f actorization, the solution of the 
Riccati equation requires a fixed time for adaptive infinite 
and fixed horizon cases. 

The execution time for the present LQ control law is 
approximately equivalent to 5 U-D filter updates: 1 parame- 
ter estimation, 2 U-D Riccati measurement updates, 1 U-D 
time update and 2 state estimation (for non-zero set point) 
using transmittance matrices. The computation time can be 
further reduced by using lattice filter mechanizations. 


13.7.4 Self -tuning control 

So far the discussions have been confined to the design of 
the LQ controller, when the process parameters are known. In 
practice, the process parameters are usually unknown. In 
such cases, the principle of self-tuning can be used; that 
is, the parameters of the process model are estimated from 
the input and output data, and assuming the estimated 
parameter values to be true, the state is estimated and the 
LQ control gain is computed. In other words, the principle 
of certainty equivalence (introduced in Sec. 12. 2.1) is 
invoked in computing the control action. The term self- 
tuning implies that asymptotically as the estimated 
parameters reach true values, the computed control law will 
be the same as the optimal control, which could be produced 
if the actual parameters were known. 


13.8 SIMULATION STUDIES 

LQ control of single-input single-output processes is 
discussed. A CARIMA process model is considered. The 
parameters are estimated using the recursive least squares 
estimator, with no forgetting. No prior knowledge of the 
parameters is assumed. The cost function minimized is J 3 ' 
given by (13.5.16); an adaptive infinite control horizon 
(Fig.13.7.1) is used for computation of the control input u, 
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Figure 13.8.1 LQ control of a nonminimum-phase process. 


which is constrained to lie within +100 and -100 values. The 
cost factor X is varied from 2 to 0 (where 0 is assumed to 
be vanishingly small). Additive step changes £ y and £ u to 
output y and input u respectively are applied to simulate 
external disturbances acting on the process. Simulation runs 
over 240 samples are shown in Figs. 13. 8.1 to 13.8.3, and are 
summarized in Table 13.8.1. These results are extracted from 
Clarke, Kanjilal, and Mohtadi (1985b). 

Exercise 1: A nonminimum-phase process is controlled. 
For X = 0, the unstable zero in B(q j at -2 is reflected 
inside the unit circle, resulting in a closed-loop pole at 
-1/2; although the control signal is active, the stability 
of the loop is maintained as shown in Fig. 13.8.1. 

Exercise 2: An open-loop stable minimum-phase process 
is considered; the performances of the LQ controller and the 
generalized minimum variance controller (GMV) (12.2.6) is 
studied, when the time-delay is underestimated as 1 instead 
of the actual 3. Due to underestimation of the time-delay 
and the leading coefficients of B(q -1 ) being zero (or 
estimated to be too low), the estimated process is rendered 
nonminimum-phase. With high cost (i.e. high X) on control 
GMV can still control the process but with X = 0, the GMV 
which becomes the minimum variance control, fails. The LQ 
control produces a stable control even when X = 0. The 
results are shown in Fig.13.8.2. 
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Time 

(b) 


Figure 13.8.2 Comparative performance of controllers 
for over-parameterized process model and underestimated 
time-delay (a) LQ control, (b) GMV control (Sec.12.2.2). 



Figure 13.8.3 LQ control of an over-parameterized, 
unstable, nonminimum-phase process with over-parame- 
terized model and underestimated time-delay. 
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Table 13.8.1 Summary of simulation exercises 


Exercise 

Simulated Process 

Remark s 

Exercise 1 

B(q" 1 

)=q~ 1 (0.4+0.8q' 1 ) 

Nonminimum-phase process, correct 

(Fig. 

13.8.1) 

A(q" 1 

) =1-1 . 7q ~ 1 + 0 ,72q” 1 

parameterization, correct time- 

delay. A varied between 2 to 0 
stable contro 1 . 

Exercise 2 

B(q' 1 

) =q’ 3 ( 1 +0. 5q 1 ) 

Minimum-phase process, correct 

(Fig. 

13.8.2) 

A(q~ 1 

)= 1 - 0 . 9q ~ 1 

parameterization, underestimated 
time-delay (1 instead of 3). GMV 
stable for high A, but fails for 
A = 0 ( as becomes MV control ) . LQ 
control stable for A = 2 to 0. 

Exercise 3 

B(q~ 1 

) =q~ 2 ( 0 . 5-0 . 8q~* ) 

Open loop unstable nonminimum- 

(Fig. 

13.8.3) 

A(q“ 1 

)=1 - 3q " 1 + 2q~ 2 

phase process. Over estimated 

model order underestimated 




time delay (1 instead of 2). 

A = 2 t o 0 . LQ control stable. 


Exercise 3: Here, control of a nonminimum-phase, open- 
loop unstable process is considered, when the controller is 
based on an over-parameterized model with underestimated 
time-delay (i.e 1 instead of the actual time-delay of 2). 
The control disturbance £ u of magnitude +0.5, -0.25 and 
-0.25 is applied at 45, 140 and ^165 sampling instants 

respectively. The estimator ^has three aj and four o t parame- 
ters, although only two and three bj parameters are 

required for a correctly parameterized model with under- 
estimated time-delay. The plant has a negative going 
nonminimum-phase characteristic which is responsible f or the 
initial movement of the plant in the negative direction at 
positive step changes in the set point. However, the LQ 
controller provides good response both in set-point 
following as well as in disturbance rejection. 


13.9 CONCLUSIONS 

The state-space formulation of LQ control has been studied; 
a linear quadratic cost criterion is considered, which is 
minimized over a predictive horizon. Control of both time 
invariant and time varying processes have been discussed. 

The stochastic LQ control has been shown to comprise 
two separate components, namely a linear estimation problem 
and a quadratic control problem for a linear deterministic 
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process; the former can be solved using the Kalman filter to 
produce optimal state-estimates, whereas the deterministic 
LQ controller works out to be dual of the optimal estimator. 

A method of state estimation has been presented, which 
is based on the state-space innovations model of the 
input-ouput representation and a steady-state Kalman filter. 
Numerically stable and computationally efficient implemen- 
tations have been discussed. 

Irrespective of the duality between the optimal control 
and the optimal estimator, structurally an optimal control 
problem is better posed. This is because optimal control is 
a closed-loop problem and there is a specific set-point 
trajectory to be followed, whereas an optimal estimator, which 
is an open-loop problem, has no such guideline. As a result, 
an optimal estimator (or filter), by itself, can be compara- 
tively sensitive to improper assumptions (e.g. of noise 
covariances etc. ) in the implementation, whereas when used 
within the optimal controller, the sensitivity of the 
controller to the same assumptions will be much less. 

Optimal predictors, discussed in earlier chapters, also 
have the same configurational disadvantage. Again, following 
the same arguments, the long range predictive control 
methods (discussed in Chapter 12) are significantly robust 
due to the presence of the inherent feedback loop, and are 
relatively insensitive to the quality of the multistep 
predictions. 

The equivalence between the performances of the deter- 
ministic state-space LQ control and the generalized predic- 
tive control shows that f or identical processes when the 
same cost functions are optimized, the same control action 
is produced irrespective of the design approaches. 


REFERENCES 

Remarks: Linear quadratic control is a widely covered 
subject. The texts [1,5,11,14,15,16] provide a detailed 
study of the theoretical as well as implementation aspects. 
There are many papers on this subject: [2,9]. Optimal 
estimation of states is studied in the seminal paper [10], 
where the principle of duality between the estimator and the 
controller is also introduced. Design of the optimal 
controller through Dynamic programming is treated in [31. 
The LQ controller designed in a stochastic environment from 



References 397 


input-output models appears in [6,7,12,13], where the 
computational aspects are also discussed. Detailed treatment 
of computational aspects appears in [4]. Comparative study 
of LQ control and the generalized predictive control methods 
features in [8]. 

[1] Anderson, B.D.O., and J. Moore (1990): Optimal Control: 
Linear Quadratic Methods, Prentice Hall, Englewood 
Cliffs, New Jersey. 

[2] Athans, M. (1971): ‘The role and use of the stochastic 
Linear-Quadratic-Gaussian problem in control system 
design’, IEEE Trans, on Auto. Control, AC-16 (6), 
529-552. 

[3] Bellman, R. (1957): Dynamic Programming,. Princeton Uni- 
versity Press, Princeton, N.Y. 

[4] Bierman, G. J. (1977): Factorization Methods for Dis- 
crete Sequential Estimation, Academic Press, New York. 

[5] Bryson, A.E., and Y.C. Ho (1969): Applied Optimal 
Control, Hallstead, New York. 

[6] Clarke, D.W., P.P. Kanjilal, and C. Mohtadi (1985a): ‘A 

Generalized LQG Approach to Self -tuning Control, Part 

I: Aspects of Design’, Int. J. Control, 41, 1509-1523. 

[7] Clarke, D.W., P.P. Kanjilal, and C. Mohtadi (1985b): ‘A 

Generalized LQG Approach to Self -tuning Control, Part 
II. Implementation and Simulation’, Int. J. 

Control, 41, 1525-1544. 

[8] Clarke, D.W., C. Mohtadi, and P.S. Tuffs (1987): 
‘Generalized predictive control - Part II, extensions 
and interpretations’, Automatica, 23(2), 149-160. 

[9] IEEE Trans, on Automatic Control (1971): Special issue 
on ‘The Linear Quadratic Gaussion Estimation and 
Control Problem’. 

[10] Kalman, R.E. (1960): ‘A new approach to linear filte- 
ring and prediction problems’, Trans. ASME, Journal of 
Basic Engineering, 82 D, 35-45. 

[11] Kwakernaak, H., and R. Sivan (1972): Linear Optimal 
Control Systems, Wiley, New York. 

[12] Lam, K.P. (1982a): ‘Design of stochastic discrete time 
linear optimal regulators, Part I: Relationship between 
control laws based on a time series approach', Int. J. 
Systems Sci., 13, 979-1000. 

[13] Lam, K.P. (1982b): ‘Design of stochastic discrete time 
linear optimal regulators, Part II: Extension and 
computational procedures’, f nt . J. Systems Sci., 13, 



398 Chapter 13 State-space Model based Control 


1001 - 1011 . 

[14] Maybeck, P.S. (1982): Stochastic Models, Estimation 

and Control, Vol.3, Academic Press, New York. 

[15] Meditch, J. S. (1979): Stochastic Optimal Linear 

Estimation and Control, McGraw-Hill, New York. 

[16] Strejc, V. (1981): State Space Theory of Discrete 

Linear Control, Wiley, Chichester. 



CHAPTER 14 


SMOOTHING AND FILTERING 

Data - Signal + noise 

(Information) (contamination) 

Some data are born noisy, some data pick up noise in 
transportation, while some data have noise thrust upon 
them due to improper processing/ The information in 
the data has to be separated from the contaminations 
before use. 


14.1 INTRODUCTION 

The characteristic f eatures of the inf ormation contained in 
the data are (a) the frequency components present, (b) the 
pattern in the data, (c) the real-timeliness of the data 
etc. The data represent the behaviour of the underlying 
process in terms of such features. In practice, the 
information in data is rarely devoid of noise contamina- 

tions. The noise may be inherently associated with the data, 
or it may be linked with the data at a subsequent stage. For 
example, (i) any data obtained from empirical measurements 
(say, the sinter strength measurement discussed in Sec. 
5.6.2) are generated noisy because of the empirical nature 

of the measurement; (ii) the measurement of the fetal ECG 
through maternal ECG, is a case where noise is picked up in 
transportation; again (iii) the phase-shift or time-lag 
introduced in a data sequence due to exponential smoothing 

or low-pass filtering (see Fig.4.2.2) is a case of noise 

being thrust upon the data. 

The noise associated with the data should be either 

eliminated or its influence should be substantially reduced 

when the data are’ to* be used 'for identification;- prediction 
or control purposes. Both smoothing and filtering are aimed 
at minimizing the effects of noise in the data. It is 

important that the characteristic features of the 
information contained in the data are not affected in the 

process of smoothing and filtering. 

By definition, smoothing implies estimation of past 
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values based on all the information available up to the 
present time. Scientists belonging to different disciplines 
often use different methods for smoothing (see for example 
Sec.4.2.1). In this chapter two classes of approaches are 
presented, namely optimal smoothing and bidirectional 
(low-pass) filtering. Optimal smoothing algorithms designed 
in state-space framework, are discussed in Sec.14.2. Three 
useful classes of optimal smoothers characterized by the 
smoothing interval, the point at which the smoothed value is 
desired and the lag at which smoothing is to be performed, 
are studied. 

Bidirectional smoothing, treated in Sec. 14.3, offers a 
comparatively simpler smoothing approach; it is applicable 
when the frequencies contained in the signal are lower than 
those contained in, the noise. 

Orthogonal transformation, offers an alternative 
approach to smoothing and filtering, where separation of the 
signal is achieved through separation of the orthogonal 
components; so it is conceptually different from frequency 
based smoothing or filtering. This subject is introduced in 
Sec. 14. 4. The applications of SVD in smoothing, pattern 
estimation and selective filtering are explored in Sec. 14. 5. 
A case study on the extraction of fetal ECG from maternal 
ECG signal is presented in Sec. 14. 6. 


14.2 OPTIMAL STATE-SPACE SMOOTHING 

Smoothing algorithms based on state-space models of the 
process are studied in this section. The state-space 
framework permits the use of many well-studied and 
established algorithms (like the Kalman filter etc.), for 
which numerically stable and computationally efficient 
implementations are available. A detailed study of optimal 
smoothing is beyond the scope of this book; the prime 
concepts and some of the algorithms are discussed here. 

Optimal smoothing concerns estimation of past values of 
the variables of interest, based on the available 
inf ormation: 

x(k|k+i) = £{x(k)|y(0),y(l) y(k+i)>, 

that is, given the noisy measurements y(0),...,y(k+i), 
the objective is to determine the optimally smoothed 
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estimate for x(k) in state smoothing or for ^(k), in data 
smoothing. In comparison with the estimation x(kjk), it is 
expected that x(k|k+i), produced with the incorporation of 
additional i-measurements, will be more representative 
though at the cost of additional computation and complexity. 

Smoothing problems belong to three main categories 
(see Fig.14.2.1) as follows. 

(1) Fixed-interval smoothing implies estimation of x(k|N), 
for OsksN, i.e. k lying within the finite time inter- 
val, 0 to N, where N is the fixed final time. 

(2) Fixed-point smoothing implies estimation of x(k|k+i), 
where k is a fixed point in time and k+i stands for all 
subsequent points in time. 

(3) Fixed-lag smoothing implies estimation of x(k|k+L) 
where k is any point in time and k+L is a point L 
(constant) steps ahead of k. 

The optimally smoothed estimate is also the minimum variance 
estimate, the cost minimized being, for example in case of 
fixed interval smoothing: 

J = min £{[x(k)-x(k[N)] T [x(k)-x(k|N)] |y(k):k = 0,1 N> 

x(k|N) 

= min Tr £{[x(k)-x(k | N)] [x(k)-x(k|N)] T |y(k):k * 0,1 N>. 

X,lt l N) (14.2.1) 

Different recursive algorithms have been proposed for 
solving the optimal smoothing problems (Anderson and Moore, 
1979), some of which are presented here. 


Problem formulation 

Consider the linear discrete-time process model 
x(k+l) = A(k+l|k)x(k) + s(k+l |k)w(k), 
y(k) = c T (k)x(k) + v(k), 


(14.2.2a) 

(14.2.2b) 


where x is nxl state vector, y is a scalar measurement; A 
is an nxn matrix, and c and s are nxl vectors. (w(k)> and 
(v(k)> are independent, zero-mean, white noise sequences: 

£{w(k)w(r» = («“>• EMkMx)) = (£ k) £ x ; 
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Forward filtering x(k|k-l) 


(a) 


f 


Reverse filtering x(k|N) 
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Figure 14.2.1 Optimal smoothing schematic. 

(a) Fixed-interval smoothing: Both the estimation and the 
computation time-points are fixed at N. Smoothed esti- 
mates for all points within the interval are produced. 

(b) Fixed-point smoothing: The estimation time is fixed 
whereas the computation time-point moves forward with 
time. 

(c) Fixed-lag smoothing: Both the computation and the 
estimation points move forward with fixed mutual 
distance, the estimation produced after minimum delay of 
L samples. 
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It will be seen that the optimal filter (Kalman filter, 
Sec. 6. 6) is an integral part of the optimal smoothing 


algorithms. For the process (14.2.1 - 14.2.2) the Kalman 
filter equations can be restated as follows. 

x(k|k) = x(k|k-l) + k(k)(y(k) - c T x(k|k-l)), (14.2.3a) 

x(k+l|k) = Ax(k|k), (14.2.3b) 

k(k) = P(k | k-l)c(c T P(k | k-l)c + R(k)f\ (14.2.4) 

P(k|k) = [I - k(k)c T ]P(k|k-l), (14.2.5) 

P(k+1 1 k) - AP(k|k)A T + sQ(k)s T , (14.2.6) 


with the initial conditions x(Oj-l) = x 0 and P(0|-1) = P 0 ; 
the arguments of A, c and s are same as in (14.2.2a - 
14.2.2b). 

The objective is to compute smoothed estimate of the 
state x(k): x(k|k+i) for i a 1. Jhe smoothed estimate of the 
measurement y(k) is obtained as y(k|k+i) = c(k)x(k| k+i). 


14.2.1 Fixed-interval Smoothing 

Fixed-interval smoothing involves bidirectional or forward 
and backward filtering. There are two main approaches. 

(1) The optimal (Kalman) filter is run in the forward 
direction in time over the data set f or the whole interval 
and the output of the forward filter is used by a backward 
filter which is run from the end terminal to the point of 
interest to obtain the smoothed estimate. In fact the 
backward filter provides correction to the Kalman estimates 
to produce smoothed estimates. 

(2) Two separate optimal filters are operated in the 
forward and in the backward directions in time. The desired 
smoothed estimate is obtained as a weighted sum of the 
estimates produced by the forward and the backward filters 
(Fraser and Potter, 1969). 

The present studies are based on the former approach. 
Rauch-Tung-Streibel (RTS) algorithm 

One of the classic approaches to fixed-interval smoothing 
was due to Rauch, Tung and Streibel (1965): 
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x(k|N) = x(k|k) + L(k)[x(k+1 jN) - x(k+l|k)J, 

(14.2.7) 

P(k|N) = P(k } k) + L(k)[P(k+l|N) - P(k+l|k)]L T (k), 

(14.2.8) 

L(k) = P(k | k)A T (k+l | k)P -1 (k+l | k), (14.2.9) 

for k = N-l, N-2 0, 

where P(k+ljk) is the covariance of the prediction error 
x(k+l|k) = x(k+l)-x(k+l|k), and P(kjN) is the covariance of 
the fixed-interval smoothing error x(k|N) = x(k)-x(k|N). 

The main disadvantages of the RTS algorithm are 
computational and numerical problems connected with (i) the 
inversion of P(k+l|k) in (14.2.9) which may be 
ill-conditioned, and (ii) the differencing of two positive 
(semi)definite matrices in (14.2.8). 

Modified Bryson-Frazier (MBF) algorithm 

The modified Bryson-Frazier algorithm is a numerically 
stable and computationally efficient formulation for the 
fixed-interval smoothing (Bryson and Frazier, 1963, Bierman, 
1977, p. 223). The basic idea is (a) to express the smoothed 
estimate of the state and the covariance in terms of two 
adjoint or intermediate variables (say, A and A), and (b) to 
perform the recursive backward filtering using the adjoint 
variables instead of the states and the covariances. The mBF 


algorithm is stated as follows 

x(k|N) = x(k|k-l) + P(k|k-l)A(k|N), (14.2.10) 

P(k|N) = P(k j k— 1) - P(k j k~l)A(k | N)P(k | k-1), (14.2.11) 

where k = N-l, N-2,..., 1, the nxl adjoint vector A and the 
nxn adjoint matrix A satisfy the following recursive 
equations: 

A(k|N) = A a T A(k+l|N) + cfy(k|k-l), (14.2.12) 

A(k|N) = A a T A(k+l|N)A a + cfc T , (14.2.13) 

where nxn matrix A a and the scalar f are defined with 
appropriate arguments as 

A a (k+l|k) = A(k+1 1 k)[I-k(k)c T (k)J, (14.2.14) 

f(k) = (c T (k)P(k|k-l)c(k) + R(k))"\ (14.2.15) 

y(k|k+l) = y(k) - c T (k)x(k|k+l), 



14.2 Optimal State-space Smoothing 405 


k = N,N-1,..,1» and the terminal conditions are MN+l | N) = 0 
and A(N+l|N) = 0. k(k) and P(k j k— 1) are the Kalman filter 
gain and the covariance matrix respectively corresponding to 
the forward filtering (14.2.3 - 14.2.6). A(k|N) is the 
covariance of A(k|N): 

A(k|N) = £{A(k|N)A T (k|N)>, 

X(k | N) is zero mean. 

Different designs of smoothing algorithms based on 
adjoint or intermediate variables have been proposed (see 
for example Watanabe (1986), the main objectives being 
improved numerical stability and computational efficiency. 

Special features 

(a) Duality with Kalman filter 

The backward recursion of the adjoint variables is dual of 
the Kalman filter computations; Equations (14.2.12) and 
(14.2.13) are dual of (14.2.3) and (14.2.6) respectively. 

(b) Implementation aspects 

The mBF algorithm involves the following steps: 

(1) Perform forward filtering (14.2.3 - 14.2.6) over the. 
specified interval, k = 0,1,. ..,N. 

(2) Perform backward recursions (14.2.12 - 14.2.13) to 

compute the adjoint variables A(l^| N) and A(k|N). 

(3) Compute the smoothed estimate x(k|N) using (14.2.10). 
Computation of the error covariance P(k|N) in 
(14.2.11) is optional. 

The duality with Kalman filtering (or often more directly 
with optimal control), permits use of stable and efficient 
implementations, as illustrated in the following example. 

Example : Consider an AR process: 

y(k) + a t y(k-l) + ... + any(k-n) = v(k), 

where <v(k)) — is a --sequence of ' uncorrelated' equation error. 
The objective is to compute fixed interval smoothed 
measurement sequence <y(k)>, k = 0,1,. ..,N. 

Let the corresponding state-space model be given by 
x(k+l) = A(k+l|k)x(k) + s(k+l|k)w(k), 
y(k) = c T (k)x(k) + v(k). 
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where 


‘-a, 1 0 ... O' 


~ a i 

— a 2 0 1 ... 0 


-a 2 

• ; ; ; 

, s = 

• 

0 0 0 ... 1 


0 

-a n 0 0 ... 0 


. _a n. 


T 

c = [1 0 ... 0] , and v = w. 


If k = [kj k 2 ... k n ] T , 


'-a^l-kj-kz 1 0 
-a 2 (l-k 1 )-k 3 0 1 

-a n-l ( ^”^1 )~k n 
^(1-^) U 


0 

0 

1 

0 

M 


Thus (14.2.13) is rendered the dual of the covariance time- 
update (13.7.3) of the state-space formulation of the 
optimal LQ control problem; hence the vector implementation 
using U-D factorization (Appendix 13C) can be directly used, 
where A a and f are substituted for A and q (see (13C.2a)) 
respectively in the FORTRAN implementation. The rest of the 
smoothing problem is straightforward. 


(c) Data storage requirements 

The present algorithm requires storing of the n-vector k(k) 
and the scalars f (k) and y(k|k+l) for k = 0,1,.. .^N, are 
produced from the forward filtering. In addition, x(k|k-l) 
and P(k | k— 1) need to be stored onl^ for specific values of k 
for which the smoothed estimate x(k|N) is to be computed. 
Thus the storage requirement is much less compared with the 
RTS algorithm, which requires x(kjk), x(k+l|k), P(K|k) and 
P(k+l|k) to be stored for each value of k. 

(d) Stability 

The present algorithm is numerically stable because (i) no 
matrix inversions are involved and (ii) the smoothing 
recursions are computed using the adjoint variables rather 
than the smoothing error covariances P(k|N). 

In fact, P(k|N) need not be computed at all except for 
diagnostic purpose. 
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14.2.2 Fixed-point Smoothing 

Often, the initial conditions or conditions at a specific 
time point is of prime concern as the process or the 
experiment progresses with time, e.g. the time of firing of 
a rocket, when a satellite is tracked from a ground-station. 
In such cases, the objective is to estimate x(k|j), where 
j>k, which is given by the following algorithm. 

x(k | j) = x(k|j-l) + g(k| j)y(j| j-1), (14.2.16) 

g(k|j) - S(k| j)c(j)f(j), (14.2.17) 

S(k | j) = S(k|j-l)A a (j|j-l), (14.2.18) 

for j = k+1, k+2,..., where nxn matrix A a and scalar f are 
defined as 

A a (j | j-D - A(J | J-1) [I - k(j-l)c T (j-l)], 
f(j) = (c T (j)P(j| j-l)c(j) + R(j))' 1 , 

and 

y(j|j-l) = y(j) - C x (j)x(j|j-1). 

The iterations are initialized with 

x(k|j-l) = x(k|k), and 

S(k | j) = S(k|k+1) = P(k|k)A T (k+l|k), 

in (14.2.16) and (14.2.17) respectively. 

If necessary, P(k|j), the ^covariance of the fixed-point 
smoothing error, x(k|j) = x(k)-x(k|j), may be computed from 

P(k | j) - P(k| j-1) - g(k | j)f " 1 (k)g T (k | j), (14.2.19) 

with the initialization, P(k|j-1) = P(k|k). 

Summary 

(1) Perform forward filtering (14.2.3 - 14.2.6) over the 
available data up to the specified point in time k. 

(2) From the point k onwards, for each time-step, perform 
smoothing iterations (14.2.16 - 14.2.18), following the 
forward filtering to compute x(k|j), where j=k+l, 
k+2, etc. 

(3) Computation of error covariance P(k|j) is optional. 

Remark: The algorithm discussed here is forward recursive in 
time, and hence cam be performed on-line. The smoother does 
not require prior information of the final measurement. 
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14.2.3 Fixed-lag Smoothing 

The fixed-lag smoothing problem can be configured as a 
fixed-interval smoothing problem, the objective being to 
compute the smoothed estimate x(k|k+L) of x(k), after a 
delay of L time points, where L is fixed. The implementation 
involves the following steps: 

(1) Compute forward filtering (14.2.3 - 14.2.6) up to the 
time point (k+L). 

(2) Perform backward recursions of augmented variables A 
and A in (14.2.12 - 14.2.13) to determine A(k|k+L) and 
A(k|k+L), the initial conditions being A(k+L+l|k+L) = 0 
and A(k+L+l|k+L) = 0. 

(3) Compute fixed-lag smoothed estimate x(k|k+L) from 
(14.2.10). 

For further details, refer to Sec.14.2.1. 


14.2.4 Observations and Comparative Study 

(a) Relationship with Optimal Filter 

The Kalman filter is an integral part of the optimal 

smoother, which is also linear and has the same dimension as 

the Kalman filter. The following comparative features are 

worth noting. 

(1) Like the optimal filter, all the presented smoothing 
algorithms are driven from the innovations process 

y(k | k— 1) = y(k) - c T (k)x(k|k-l). 

(2) As in the case of the optimal filter, the time points 
k, k-1,... etc. need not be equispaced for the optimal 
smoother. 

(3) Unlike the optimal filter, the error covariance 
P(k|k+i), are not an integral part of the smoothing 
iterations and hence need not be computed; they can 
be used for diagnostic purposes. 

(4) The stochastic error processes for all three categories 
of optimal smoothing are Gauss-Markov-2 processes, 
unlike optimal filtering and prediction cases which are 
Gauss-Markov processes (Sec.2.2.2). 

(b) Improvements due to smoothing 

(1) By definition, the optimally smoothed estimate produces 
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minimum smoothing error covariance P(k|N). So a 
comparison between the diagonal elements of P(k|N) and 
the filter covariance P(k|k), gives the first hand 
information about the improvement in the state 
estimates due to smoothing. 

(2) The improvement due to smoothing is realizable when 
there is considerable driving noise w and measurement 
noise v. 

(3) The improvement due to smoothing, given by 

(P(k|N)-P(k|k)) increases with the increase in N. In 
fact, the rate of increase depends on the dominant time 
constant of the forward Kalman filter which is given by 
the eigenvalues of (A - kc). In general the smoo- 
thing is most effective over the region, two to three 
times the dominant time constant of the Kalman filter 
from k. 

(4) For a limited horizon, P(k|N) is relatively higher for 

values of k close to N which is due to the transient 
state of the backward filter. 

(c) Optimality 

The optimality of optimal smoothing is confined to the 

condition of satisfying the minimum mean square estimation 
error criterion (for example (14.2.1)). However it does not 
necessarily guarantee minimum phase shifts or time-delay 
consequent to smoothing, which is important particularly in 
prediction studies. 

(d) Miscellaneous features 

(1) The three broad categories of optimal smoothing are 

mutually equivalent (i.e. one can be derived from the 

others), although they concern different ways of 

configuring the optimal smoothing problem. 

(2) For short data sets, fixed-interval smoothing is most 
appropriate. 


14.3 Bidirectional Filtering 

Two fundamental issues related to smoothing or filtering are 

(i) the mean square estimation error and 

(ii) the real-timeiiness of the estimates. 

Often it is necessary to perf orm smoothing or filtering of 
the data before use in the predictor. This is to separate 
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high frequency noise from the data. However, smoothing or 
filtering should not damage the real -timeliness of the data 
(for example, it should not introduce lag or time delay in 
the data), as otherwise the prediction performance dete- 
riorates. In this section, first some basic off-line 
approaches to time-lag-f ree smoothing are discussed. One 
problem with the off-line method is the inability to process 
the terminal data. So the concept of real-time bidirectional 
filtering is introduced where a real-time predictor is used 
in combination with the filter. 


14.3.1 Off-line Method 

Here, frequency domain low-pass filtering is performed 
bidirectionally. In other words, the data are passed through 
the same filter in opposite directions to compensate for the 
time-delay and finally delay-free estimates are produced. 

The implementation of the filter is detailed in 

Fig. 14.3. 1(a); it involves the following steps: 

(a) a forward pass of the data through the chosen filter, 

(b) reversal in time of the filtered data obtained from (a), 

(c) a forward pass of the filtered data obtained from (b) 
through the same filter as in (a). 

(d) reversal in time of the filtered data so obtained, 
which are the phase-shift free smoothed estimates. 

An alternative but equivalent scheme is also presented in 

Fig.l4.3.1(b). 

Design and analysis 

The design of the bidirectional filter involves two main 
issues: 

(1) The extent of filtering, that is the highest frequency 
to be permitted in the filtered data is a prime 

consideration. If a first order filter given by 

F(q -1 ) = — , 0<ee<l. (14.3.1) 

1-aq" 1 

is used, a slow pole (that is a higher value of a) will have 
a lower cut-off frequency (that is relatively lower 

frequencies will be passed as discussed in Appendix 14A). 



14.3 Bidirectional Filtering 


411 



(a) 


Data 
( l , . . .n) 


Reversed 

data 




Smoo t hed 
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(b) 

Figure 14.3.1 Bidirectional filter schemes: 

(a) One data reversal following each of the two passes 
through the same filter produces filtered data; 

(b) Same lag or phase-shift in opposite direction 
produced by single pass of the data and the reversed 
data through two similar filters; the filtered data 
combine in phase to produce the final output. 

As too fast a pole will not be able to result in desirable 
elimination of high frequency components, too slow a pole is 
likely to eliminate useful information in the data. So 
appropriate design of the filter is important. 

(2) The final estimates obtained through bidirectional 
filtering will ideally have zero time-lag, if the charac- 
teristics of the forward and the reverse filters are 
identical. 

Remarks 

(1) Longini et al (1975) describe a hardware implementation 
of bidirectional filtering. The electronically taped data 
are passed through a filter and are retaped. The tape 
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Hours 

Figure 14.3.2 Bidirectional filtering of noisy green- 
mix permeability data with a = 0.6 in (14.3.1). 


containing the filtered (retaped) data is played in 
(physically) reversed direction through the same filter; the 
consequent output is taped, which when run in reverse 
direction produces time-delay-free filtered data. 

(2) The bidirectional feature is implicit with the 
state-space formulation of the optimal smoothing methods 
incorporating a forward filter and a reverse filter. 

(3) The centred moving averaging (Sec. 4. 2.1 and Appendix 
4), is also a form of bidirectional filtering. 

Example 14.3.1 Filter the green-mix permeability data. 

The permeability of green-mix is an important process vari- 
able in the iron-ore sintering process (discussed in Sec. 
5.6.1). The series (see Appendix 14B) contains 2-minutely 
noisy data for 5 hours from an iron and steel plant. 

A first order filter (14.3.1) is used with a = 0.6. The 
results of bidirectional filtering are presented in Fig. 

14.3.2; the delay free output of the bidirectional filter is 
shown by the bold line. 


Example 14.3.2 Determine the trend in the German unemploy- 
ment series using bidirectional filtering. 

The trend of a time series is basically the slowly varying 
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Figure 14.3.3 (a) Trend ( ) estimation of the 

German unemployment series (Appendix 7E) with a = 0.9. 
(b) Trend extracted German unemployment series. 


trend component, which can be separated by low-pass filter- 
ing with sufficiently low cut-off. 

The present series (Fig. 14.3.3. (a)) is a monthly data 
sequence with yearly periodicity. A bidirectional filter 
with am a=0.9 is chosen. The trend extracted residual series 
is shown in Fig. 14.3.3(b). 

The sensible choice for a requires the consideration of 
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the lowest f requency component that can be present due to 
the yearly periodicity; the cut-off point of the filter has 
to be chosen lower (see Sec. 2. 5. 4). 


14.3.2 Real-time Filtering 

The term real-time signifies relevance to time. The objec- 
tive of real-time filtering is to filter the current data 
with minimum time lag, to be available for immediate use. 

It is known that the smoothing characteristics of 
conventional smoothers including the optimal smoothers 
deteriorate considerably towards the terminal points. In 
conventional unidirectional filtering, including Kalman 
filtering, the filtered estimates suffer from considerable 
phase-shift. Ideally, the phase-shift or lag should be zero, 
particularly if the filtered values are to be used by a 
predictor. Real-time filtering with minimum delay can be 
achieved by trying to imitate the bidirectional filtering 
process on the extrapolated data set. In other words, the 
available data are used for multistep prediction (say, 

y(k+l|k) y(k+p|k}> and then the bidirectional ^filtering 

is performed on the data (say.^ y(0), y(l) y(k+p|k)) 

to produce the desired estimate y(k|k). The consequent 
minimization of lag or bias in the data will be largely 
dependent on the quality of prediction. 

Real-time filtering of the data has been used for 
sinter quality prediction discussed in Secs.5.6.2 and 5.6.3. 


14.4 SMOOTHING AND FILTERING USING ORTHOGONAL 
TRANSFORMATION 

Introduction 

As discussed in Chapter 7, orthogonal transformation results 
in compaction and relative decorrelation of inf ormation in 
the transformed components of the data. Consequently 
separation of unwanted transformed components from the rest 
is easier. Usually the contaminations in the data, which are 
uncharacteristic of the underlying process, are confined to 
transformed components with low energy content or low signal 
strength, which are eliminated. Orthogonal transformation 
being a linear and reversible operation, data smoothing 
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involves three basic steps: 

(1) orthogonal transformation of the data, 

(2) the elimination of the unwanted or insignificant trans- 
formed components, and 

(3) the data reconstruction through reverse transformation. 
The intermediate step of elimination of unwanted components 
causes actual smoothing. Besides smoothing, selective 
filtering is also possible, where the signal of interest may 
or may not be the prime component. 

The degree of concentration of inf ormation in the 
transformed components and the degree of decorrelation among 
different components will determine how efficiently smoo- 
thing and filtering can be performed. The numerical 
robustness properties of the orthogonal decomposition will 
also be associated with the smoother or the filter. 

The present studies are limited to the use of singular 
value decomposition for smoothing and filtering, although 
other types of orthogonal transformations can also be used. 

Comparison with optimal smoothers 

As against state-space based optimum smoothers discussed in 
Sec. 14. 2, the following comparative features of orthogonal 
transformation based approaches can be observed. 

(a) The orthogonal transformation based smoother has 

structural similarity with the fixed interval smoother, the 
smoothing error covariance matrix P(k|N) cam be used as a 
diagnostic measure, whereas in the case of the orthogonal 
transf ormation smoothers the energy contained in the signal 
and that in the eliminated components can provide a measure 
of the relative strength of the noise. 

(b) One fundamental structural difference between the 

state-space smoother and the orthogonal transformation 
smoother is that the former assumes causality, whereas the 
latter does not; it is mainly based on the eigen properties. 
Hence while the noise statistics have to be prespecified in 
the case of state-space smoothers, it is not required f or 

the orthogonal transformation based smoother. 

(c) For state-space smoothers the smoothing property 

deteriorates towards the end of the data sequence and the 
error covariance tends to increase. In the case of the 
orthogonal transformation smoother, the smoothing depends on 
the total information contained in the data set concerned, 
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and the smoothing performance is not expected to be 
different towards the terminals; the indifference to 
causality is another way of explaining the same thing. 

Remarks 

(a) Since Fourier transf ormation decomposes data into 
frequency components, the frequency domain filtering is 
equivalent to the separation of components after Fourier 
transf ormation. 

(b) Different classes of orthogonal transformation result in 
different types of decompositions, and the corresponding 
subsequent smoothing or filtering algorithm will have its 
own characteristic features. For example, the singular value 
decomposition (SVD) based filtering can be independent of 
the frequency distributions of the data, and hence can be 
quite different from Fourier transformation based filtering; 
in the case of the latter the orthogonality is in the 
frequency domain, whereas in the case of SVD the 
orthogonality is in the inf ormation space corresponding to 
the algebraic configuration of the data matrix. 


14.5 SMOOTHING AND FILTERING USING SVD 

Applications of singular value decomposition to nearly 
repetitive periodic processes is considered, where the 
pattern, the length of period and the average value over a 
period may all change to a limited extent. The three specific 
applications studied are smoothing, pattern estimation and 
selective extraction of pattern through filtering. 

Singular value decomposition has been discussed in 
detail in Sec.7.6. SVD of a real mxn matrix A is given by 

A ■ USV T ; 

if minim, n) = p, and if A is of rank r (sp), then 

A « £ u 1 s l Vi, (14.5.1) 

where Sj are the singular values: ^ a s 2 a ••• x £ s r >0 » and 
s r+i ” s r+2 *=•••= Sp = 0. UjSjvJ and ujSjvj are ortho- 
gonal to each other for i*j, where i and j are between 1 and 
r. 
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Figure 14.5.1 The smoothing of the atmospheric ozone 
column series. 

14.5.1 Smoothing 

In the present context, smoothing is obtained through 
successive rejection of orthogonal components starting from 
the end having lowest energy. In (14.5.1), the component 
having the lowest energy is given by u r s r Vp. If S is the 
diagonal matrix of g singular values after elimination of 
(r-g) smallest singular values, _the corresponding truncated 
singular vector matrices being U and V, the smoothed data 
matrix will be given by 

£ = USV T , 

where 0, S and V, will be mxg, gxg, gxn matrices 
respectively. There is no definite rule regarding how many 
modes connected with the smaller singular values should be 
eliminated. Usually it is based on the relative strength 
with respect to the prime mode u 1 s 1 Vj, which is given by 
the ratio s 1 /s 1 for the i-th decomposition component. 

An application of SVD for data smoothing follows. 

Example 14.5.1 Smoothing of monthly series of ozone column 
thickness 


The (monthly data on the) thickness of the ozone column 
shows yearly periodicity with a large amount of variability. 
The data are given in Appendix 7B. As shown in Fig. 14. 5.1, 
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the series is periodic with a lot of variability between the 
periods. For this study, data for 21 years (1932 to 1952) 
are used. 

The data are arranged into the rows of a 21x12 matrix 
A, with the data for each year arranged row-wise. The singu- 
lar values of A are obtained as follows: 

Sj to s 12 : 5390.1, 121.9, 86.4, 79.3, 67.2, 56.7, 

49.8, 41.1, 34.9, 32.4, 22.4, 19.3. 

The smoothed A is constructed by considering only the most 
dominant singular ^alue (i.e. g = 1). The smoothed series is 
reconstruced from A and is as shown in Fig.14.5.1. 

Remark : This scheme of smoothing may be applied f or the 
reconstruction of the missing data. Smoothing is performed 
with interpolated data in place of the missing data; the 
corresponding smoothed estimates are the reconstructed data. 


14.5.2 Pattern Estimation 

The problem addressed here is the identification of a nearly 
repetitive pattern from multiple sets of data from the same 
experiment or process, using SVD. Here, the term pattern is 
used to mean the distribution of the data series over a 
period. For example (a) stress tests on blocks of the same 
metal will show almost similar patterns if a number of 

experiments are conducted, or (b) the power load demand on a 

substation for a particular day of the week will show 

similar patterns for a number of consecutive weeks. Such 
processes can be characterized by the following features: 

(a) Repetitiveness of the process. 

(b) Periodic or nearly periodic nature, i.e. the period 

length may be same, or it may vary to some extent. For 
example the number of measurements recorded for 
different experiments may not be exactly the same when 
repeated. 

(c) Absence of any conspicuous trend. In cases where there 
is a trend, it has to be removed before pattern 
estimation. 

The objective is to estimate one pattern spanning one period 
from the available periodic or nearly periodic sets of data. 

The characterization of periodic and nearly periodic 
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processes through singular value decomposition is discussed 
in Sec. 7. 7.1, which is used for pattern estimation in the 
present case. 


Problem formulation 


It is assumed that the data sets concerned are nearly 
periodic. The period length may be fixed or may vary to some 
extent. In the latter case, the optimum period length is 
determined using the SVR spectrum of the data series formed 
from the available successive periods; the periods, having 
period length different from the optimum, are stretched or 
compressed (as is necessary) uniformly to the optimum period 
length as discussed in Sec. 11. 2.1. A simplified alternative 
is to use the maximum period length in the data as n and to 
use interpolated data wherever necessary to make periods 
equal in length. 

The m consecutive periods of the data erne arranged in 
am mxn matrix A so that the consecutive periods are aligned 
into the consecutive rows of A; n is the period length. Any 
missing data may be replaced with interpolated data. 

It is known (Sec. 7. 7.1) that a matrix with nearly 
repetitive rows will produce one prime singular value on 
singular value decomposition (SVD), with the other singu- 
lar values being insignificantly small, i.e. s x »s 2 in S: 

A = USV T , U = [u x u 2 ...uj, 

“ A « u 1 s 1 vj, u x = (u n u 21 ... u^] 1 , 

where u x and v x are the first columns of U and _V 
respectively. Thus A is approximated by a rank-1 matrix A. 
If u lx is,the_i-th element of u x , the i-th row of A is given 
by u 11 s 1 v 1 . A thus represents the principal periodic pattern 
in A. The points to note are as follows. 

(a) the rows of A will contain repeating periods or 

patterns, there being m, n-long periods, 

(b) v x represents the basic pattern of the periods, which 

is the same for all the periods, 

(c) the elements of u x s x represent the scaling between the 
rows, the i-th row of A being weighted by u ix s x . 


The objective is to 
periodic pattern of A: 

A AAAT 


y av = usv . 


estimate the 


average 


row-energy. 
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Average energy pattern 

The energy contained in the i-th row of A is given by 


Ei = [u 11 s 1 vf] T [u ll s 1 vJ]. 


a Uj 1 ts 1 vJ] T [s 1 Vi]. 

Hence the average energy contained in a row of A is given by 

E " ■ 2 

- [ 7S ^75 - S 1 - 

T T 

since UjU! = 1, and v^ = 1. Hence the desired average 
energy periodic pattern is given by 


A s,v, 


(14.5.1) 


Remarks 

(a) The present method is conceptually different from time 
averaging. Here, the pattern is extracted through the 
separation of the noise, which is eliminated as an algebra- 
ically orthogonal component to the pattern of interest. 
Simple Averaging being eqivalent to low(frequency)-pass FIR 
filtering, any noise which is orthogonal to the signal 
cannot be separated by averaging, if the f requency 
components of the noise are also present in the signal. 

(b) Besides pattern estimation, one direct application of 
the proposed method is data compression. For example, if one 
is interested in long-term storage of the ECG data, only one 
element (sj) and one vector (vj) will be required to be sto- 
red to represent the ECG pattern over a short spam of time. 

(c) Where there are more tham one (say, g) singular values 
dominant, the estimated pattern will be given by 



s , v, 




(14.5.2) 


if the elements of u t do not change sign. 


Example 14.5.2 Estimation of the daily electric power load 
pattern on a substation 

The hourly data on the electric power load on a substation 
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Figure 14.5.2 The Monday’s load pattern of electrical 
power over a period of 24 hours. 

data patterns, estimated pattern. 


(Apppendix 7D) shows a daily periodicity as well as weekly 
and yearly periodicities. Consider the problem of estimating 
the Monday-load-pattern over a month. 

The data for the four consecutive Mondays are arranged 
into a 4x12 matrix A, and the load pattern is estimated 
using (14.5.1). Here, only the most dominant decomposition 
component was used for pattern estimation. 


14.5.3 Selective Filtering 

In a data series or signal, besides the noise, there may be 
two or more information components; the signal component of 
interest may not be the most dominant SV-decomposed 
component. In other words, the signal component of interest 
may be mixed with stronger unwanted auxiliary signals. 
Selective filtering concerns separation of a signal 
component by selective elimination of unwanted decomposition 
components. 

In the case of a periodic signal arranged into a 
matrix, if the signal components are mutually orthogonal, 
they will be associated with specific singular values s t and 
hence^ with the corresponding SV-decomposition components 
u i s i v i> where 
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A = USV T = J u lS ,vI. 

1=1 

T 

The selective separation of the components u^Vj can be 
performed directly in such cases. 

In practice, signal components are rarely orthogonal to 
each other, so the f ollowing f eatures of a data matrix 
decomposed by SVD are to be noted. 

(a) A predominant first singular value is indicative of the 
presence of a strong periodic component given by 
“iSi vj. 

(b) If the last few singular values are insignificantly 
small, they are associated with the noise components. 

(c) The singular values which are in between may contain 
useful information. 

A case study illustrating the application of selective 
filtering follows. 


14.5.4 Case Study: Fetal ECG Extraction from Maternal ECG 

The ECG signal recorded from the abdominal lead on the 
mother shows a f etal ECG component submerged within the 
maternal ECG. From a clinical point of view f etal ECG 
contains useful information about fetal health. Besides the 
interf ering maternal ECG signal, there is noise due to 
maternal muscle contractions, motion artif acts due to the 
movements of the baby and the mother etc. Thus extraction of 
the fetal ECG signal can be a difficult problem. 

Widrow et al (1975) use the method of adaptive noise 
cancellation to produce the fetal ECG signal component. The 
mother’s ECG signal obtained from the chest lead is used to 
cancel the maternal ECG signal component from the composite 
ECG signal obtained from the abdominal lead of the mother. 
Another possible approach is to deduce f etal ECG signal 
through weighted summation of a number of ECG signals 
obtained from suitably positioned electrodes on the mother 
(Callaerts et al, 1990). 

The method discussed here uses only one signal, that is 
the maternal abdominal ECG signal, from which the fetal ECG 
component is extracted through selective filtering. 
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Figure 14.5.3 Maternal ECG (composite) signal from the 
abdominal lead of the mother, showing maternal and 
fetal ECG components. 


Basic principle 

The maternal abdominal ECG signal (MAECG) is composed of the 
maternal ECG (MECG) signal, the fetal ECG (FECG) signal, the 
high frequency noise components and the trend component due 
to low frequency offsets and drifts. The objective is to 
extract a reasonably clean FECG signal f rom the composite 
MAECG signal. 

Note that (see Fig.14.5.3) the MECG and FECG components 
are nearly periodic, with marginal variations in the 
periodic patterns and period lengths; the MECG and FECG are 
usually asynchronous with each other. The FECG can be 
extracted as follows. 

(1) First, the composite MAECG signal is bidirectionally 
filtered (Sec.14.3.1) to separate the low frequency drift 
component from the data. 

(2) The optimum period length n M for the maternal ECG com- 
ponent in the composite MAECG signal is determined by 
inspection or by SVR spectrum (Appendix 11). In the case of 
the former, the maximum period length may be chosen. For the 
SVR spectrum, the length corresponding to the highest peak 
is chosen. 

(3) The composite MAECG data are arranged into a matrix A 
with a row length n M (see the following Remark). Each 
maternal ECG period occupies a row of A with the peak 
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excursions T lying in the same column. SVD is performed on A: 
A = U A S A V A ; the most dominant decomposition component of A, 
A m = u A1 s A1 v A1 , represents the maternal ECG signal. The 
fetal ECG signal remains contained in the residual, 
Ar = A-A m . 

(4) The data sequence (y R (.)> is formed from the 
consecutive rows of Ar. The most dominant nearly periodic 
component in (y R (.)> will be the fetal ECG component. The 
period length n F of the fetal ECG component is determined 
through inspection or by SVR spectrum analysis of (y R (.)}. 

(5) (y R (.)> is arranged into a matrix B with the row length 
equal to n F . SVD is performed on B: B = U B S B V B . The most 
dominant SV-decomposed component of B, B F = u bi s bi v bi> is 
the extracted FECG component (the consecutive rows of B F 
contain the consecutive periods of FECG). 

Remarks 

(a) Due to physiological reasons the period lengths of both 
the MECG as well as the FECG may vary to certain extent. To 
accommodate each period in a row of the matrix A or B in 
step (3) above, two approaches may be followed. 

(i) A period may be stretched to the chosen period length 
by equivalent interpolation or extrapolation using 
(11.2.1). This operation is required at the time of 
forming the data matrix. The reverse operation is done 
at the time of forming the data sequence from the sepa- 
rated matrix (like A M ) or the residual matrix (like A R ). 

(ii) If the maximum period length is chosen as the row 
length, when the row falls short of data at the 
teminals of a row, interpolated values may be used. The 
reverse operation is performed while forming the data 
sequence from the matrix. 

(b) In the formation of both A and B, it is necessary to 
ensure that the peaks of the prime ECG signals lie in the 
same column (or the same few columns) in the different rows 
of the matrix. It is also important that the operation of 
interpolation or extrapolation or padding applied in forming 
the matrix is reversed at the time of forming the extracted 
data sequence or the residual data sequence from the 
respective matrices. 

(c) There is no frequency domain method of filtering which 
is equivalent to the SVD based selective filtering applied 
here. 
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(b) 



Figure 14.5.4 (a) The trend removed composite maternal 

ECG signal shown in Fig.14.5.3. 

(b) The SVR spectrum of composite ECG (MAECG) signal, 

(c) The extracted maternal ECG (MECG) signal. 
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Example 14.5.4 Fetal ECG extraction using real-life data 

The data used in this example (Appendix 140 were obtained 
from a mother with a gestation period of 37 weeeks. As shown 
in Fig. 14. 5. 3, the series shows the fetal ECG signal mixed 
with the maternal ECG signal and riding on an interfering 
low frequency trend component. 

To remove the low frequency trend in the composite 
maternal abdominal ECG signal, it was bidirectionally 
filtered with the filter pole at 0.88 (i.e. a = 0.88 in 

(14.3.1)). The filtered signal <y(. )> is shown in 

Fig.14. 5.4(a). The SVR spectrum (Fig.l4.5.4(b)) on <y(.)) 
showed a peak at a period length of 81; the data showed a 
variation of the period length from 79 to 90. So wherever 
necessary, the periods were unif ormly extrapolated or 
compressed (using (11.2.1)) and a 12x81 matrix A was formed. 
A was SV-decomposed and A M representing the maternal ECG 
component (Fig.l4.5.4(c)) was extracted. 

The SVR spectrum on the residual data sequence showed 
peaks at period lengths of 25, 49 and 98 (Fig. 

14.5.5(a), (b)). Since there was no peak at 75, n F was chosen 
as 49, and 20x49 matrix B was formed. B was SV-decomposed 
and the fetal ECG component B F was extracted as shown in 
Fig.l4.5.5(c). 


14.6 CONCLUSIONS 

Some methods for separation or extraction of usable 
information from the available data have been presented. 
First, optimum smoothing in state-space framework was 
discussed, which was to familiarize the reader with the 
various issues connected with smoothing. This was f ollowed 
by studies on bidirectional filtering, which can be used to 
perform smoothing with minimum lag or phase shift. The 
bidirectional processing is a characteristic feature which 
is incorporated in many algorithms for bias-free processing; 
for example the fixed interval smoother uses similar 
forward-backward passes, and the centred moving average 
(Sec.4.2.3 and Appendix 4) used in the time series analysis 
is also effectively similar in concept. 

Orthogonal transformation offers a numerically robust 
method of smoothing and signal extraction. The smoothing is 
performed through the elimination of insignificant 
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Figure 14.5.5 (a) The residual series after removal 

of the maternal ECG component. 

(b) The SVR spectrum of the residual series, 

(c) The extracted FECG signal. 
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on singular value decomposition (SVD). Besides smoothing, 
the potential of the SVD based methods in signal extraction 
and pattern estimation in a noisy environment was also 
demonstrated through application studies. The approach 
depends on the repetitive nature of the signal component of 
interest, and hence the data are appropriately configured 
for analysis. A case study on fetal ECG extraction from 
maternal ECG showed that extraction is possible with only 
one signal (i.e. the maternal ECG signal from the abdominal 
lead), and irrespective of low signal to noise ratio; the 
other available methods of fetal ECG extraction require one 
or more additional signals. The application of orthogonal 
transformation for smoothing and filtering is an area of 
active research, and the present study has been only a 
glimpse of its enormous potential. 
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APPENDIX 1 


VECTOR AND MATRIX OPERATIONS 
1A.1 Basic Definitions 

An mxn matrix A is a rectangular array of elements a t j 
arranged in m rows and n columns: 



A can also be expressed as a column vector 



where each element of the column will represent a row 
vector, for example 

a 2 = I a 21 a 22 ••• a 2nl- 

Two matrices A and B can be added (or subtracted from each 
other), if they are of the same size: 

A + B = C; 

the resulting matrix C will also be of the same size as A 
and B. 

The ^transpose of an mxn matrix A is given by the nxm 
matrix A = B which satisfies the relationship: 

a U * b ji. 1 s i < m, 1 s j s n, 

where a t j and bjj are the elements of A and B respectively. 
The following transpose relations also hold 


(a) 

(AB) T = 

bY, 

(b) 

(A + B) T = 

a t + b t , 



r- . T _T-« 

(c) 

\r nl * 

► 

n 

L c D J 

Lb t d t J 


• T 

An mxn matrix A is symmetric if A = A , that is a t j = ajj. 
A matrix is said to be skew symmetric if A = -A 1 , that is 
a t j = -ajj, and the diagonal elements are all equal to zero. 
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1A.2 Matrix Multiplication 

In matrix multiplication, the basic operation is the multi- 
plication of a row vector with a column vector of the same 
size, e.g. 

pr 

[a b c] e = ad + be + cf. 
f 

The product is a scalar, which is referred to as the inner 
product of the two vectors. 

If the inner product of two vectors is zero, the 
vectors are mutually orthogonal. If an mxn matrix A multi- 
plies with an nxr matrix B to produce the mxr matrix C, then 
the elements of C are given by 

n 

C 1J = E a lk b kJ> 
k=l 

where a lk and b k j are the elements of matrix A and B 
respectively. 

Matrix multiplication depends on the order of the 
matrices. AB is known as premultiplication of B by A or as 
postmultiplication of A by B. The multiplication 

C = AB 

is possible only if the row length of A is the same as the 
column length of B. In general AB * BA, even if both the 
matrices A and B are square. 

Some general rules of matrix multiplication follow: 

(a) A(BC) = (AB)C {associative property). 

(b) A(B+C) = AB + AC (distributive property). 

(c) in general, AB * BA (usually not commutative). 

(d) AB = O, does not imply A or B to be equal to O. 

If a column vector of length m is multiplied with a row 
vector of length n, the product is a matrix of size mxn: 


a" 

"ad 

ae 

af 

ag 

b 

[d e f g] = bd 

be 

bf 

bg 

c 

cd 

ce 

cf 

eg 


This result is referred to as an outer product or a dyadic 
product. 

1A.3 Determinant 

Every square matrix has a determinant, which is a scalar 
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quantity computed as follows. 

(a) For a 2x2 matrix A: 

A = [ a “ M, 

L a 21 a 22j 

the determinant, indicated as | A | , is given by 

I A | = a u a 22 ~ & 2 i a i 2 - 

(b) If A is a 3x3 matrix: 



a li 

a 12 

a 13 

A = 

a 21 

a 22 

a 23 


a 31 

a 32 

a 33. 


| A | — a n (a 22 a 33 - a 32 a 23 ) + a 12 (a 23 a 31 - a 33 a 21 ) 

+ a 13 (a 21 a 32 - a 31 a 22 ). 

The determinant of a matrix A must be nonzero for it to have 
an inverse A \ There are three basic properties of 
determinants: 

(i) If A and B are both square matrices, | AB | = |A||B|. 

(ii) | A | = |A T |, 

««> | [ s* i:]|-iA l( iA 4 |. 

if A! and A 4 are square matrices. 

1A.4 Matrix Inversion 

If A is a square matrix, its inverse A -1 is defined by the 
property: 

AA' 1 = I, A -1 A = I. 

If A has an inverse, it is called nonsingular, otherwise it 
is called singular. 

Some basic rules of matrix inversion follow. 

(a) [AB]" 1 = B'V 1 . 

(b) If B = A" 1 , then B T = [A T ] _1 . 

(c) If A is symmetric, A” 1 is symmetric. 

(d) A is unique, if its inverse exists. 

(e) |AA -1 | = | A | | A -1 1 = 1, or | A" 1 1 
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Some useful matrix inverses follow. 


(a) 

A 

°r= 

A' 1 0 ' 


C 

BJ 

-B _1 CA " 1 B" 1 

(b) 

A 

T = 

‘ A' 1 -A' 1 DB' 1 " 


C 

Bj 

O B" 1 


' A -1 +ER~ 1 F -ER" 1 ' 

-R _1 F R " 1 

where R = [B - C A _1 D], E = A _1 D, and F = CA' 1 . 
If C = D T , 


A 

D' 


‘ A" 1 +ER _1 F -ER' 1 ' 

D T 

B 


-(ER -1 ) 1 r' 1 


Matrix inversion lemma 

If A and C are nonsingular square matrices, 

[A + BCD] 1 = A" 1 - A _1 B [C" 1 + DA _1 B] _1 DA" 1 . 

This result can be shown to be valid by direct substitution 
as follows. Let 

DA -1 B = M 

[A + BCD] [A -1 - A -1 B [C _1 + DA" 1 B]" 1 DA" 1 ] 

« I + BCDA" 1 - B[C _1 + M] -1 DA -1 - BCM[C _1 + if^DA" 1 
= I + BCDA" 1 - BC[C _1 + MJJC' 1 * M'^'W" 1 
= I + BCDA" 1 - BCDA" 1 
- I. 

Corrollaries 

(a) If A and C are nonsingular square matrices, 

[A - BCD]' 1 = A' 1 - A'^IDA^B - C'^'W' 1 . 

(b) [I + PBCD] _1 P = P - PB[DPB + C'^DP, 

where C is a nonsingular square matrix and P is a square 
positive semidefinite matrix. 

-l (A'Si) (V T A _1 ) 



T.-l 
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where A is a nonsingular square matrix and u and v are 
column vectors. 


1A.5 Some Special Forms 


A matrix U is called upper triangular, if all the entries 
below the principal diagonal are zero: 


U 


U 11 u 12 u 13 

0 u 22 u 23 • 

0 0 U33 


U need not be a square matrix. Similarly when all elements 
above the principal diagonal are zeros, the matrix is called 
lower triangular. 

A matrix D is called diagonal, if all entires other 
than the diagonal ones are zero: 


D = 


d n 0 0 O' 

0 ^22 ^ 0 
0 0 das 0 


The inverse of a diagonal matrix is a matrix with reciprocal 
diagonal terms: 



0 0 0 


If all the diagonal entries of a square matrix are unity, 
the matrix is called an identity matrix: 


I = 


1 0 0 
o 1 0 
0 0 1 


If the rows (or columns) of am identity matrix are 
interchanged, it becomes a permutation matrix. When a 
matrix A is postmultiplied by a permutation matrix P, the 
columns of A are interchanged in the same order as the 
identity matrix rows are interchanged into P. For example, 
let 


a ll 

a 12 

a 13 


0 

1 

o' 

a 21 

a 22 

a 23 

, P = 

0 

0 

1 

a 31 

a 32 

a 33. 


1 

0 

0 


A = 
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so 



a 13 

a ll 

a 12 

AP = 

a 23 

a 21 

a 22 


a 33 

a 31 

a 32. 


The interchanging will be row-wise if A 

P. 


is premultiplied by 


1A.6 Norms 

The magnitude of a vector is expressed in terms of its norm. 
For any real vector x, 

x = [x t x 2 ... xj, 

its length or norm, denoted by |x|, is defined as 
||x|| = Jx^ = U^xfj . 

The vector norm is also called the Euclidean norm. The 
vector norm has the following properties. 

(a) ||ax|| = ||aj|.||xj| for all real a and all vectors x. 

(b) ||x|| > 0 unless x = 0 when |x|| = 0. 

(c) | x+y | 3 ||x| + ||y|| for all vectors x, y. 

(d) j| x [[ = || Uxfl , for all orthogonal matrices U, U T U = I. 

The norm and the inner product are related through the 
Schwarz inequality as 

ll* T y|| - l*|.|y|. 

for any vectors x and y. 

Similar to the vector norm, matrix norms provide a 
measure of the magnitude of the matrix. The norm of an nxn 
real matrix A is defined as 

II A | = max | Ax | , 

for all vectors x. The matrix norm has the following 
properties 

(a) || Ax I s | A)| • |jx)j , 

(b) 1 AB | 3 || A | . || B I , 

(c) || A+B || 3 1 A 1 + || B j , 

for all matrices A and B and vector x. 

There are many other norms that can be defined for 
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vectors and matrices. Some of the popularly used norms are 
as follows. 

(a) Frobenius norm of any mxn matrix A is expressed as 


«A 


F 


tn m 

E ,E I 

i =1 j=i 


1 iji 


2 


(b) 1-norm and « -norm of n- vector x are given by 


l 





max | Xi | , 
Is isn 


1-norm and w-norm of mxn matrix A are given by 


I A|i 


II A II 


00 


max £ |aij|, 
lsjsn 1=1 

max E | a ij | • 
is ism J_1 


(c) 2-norm of an mxn matrix A is given by the square root 
of the largest eigenvalue of A 1 A, that is the largest 
singular value of A. 


1A.7 Dif f erentiation 


• If mxn matrix A (={{au}}) is a function of a variable t, 

f da ll da 12 da ln 

d* dt dt d ? 


da lm da 2m da m n 

dt dt ■*' dt 


If m-column vector x is a function of n-row vector y, where 


X = 

fxT 

x 2 '... 

Xm] T » 





y = 

then 

[yi 

y 2 ••• 

yj. 


"dX! 

8x t 

dXi 

ax = 

'ax 

dx 


ax 1 

8y t 

dy 2 ’ 

ay n 

ay “ 

ay x 

sy 2 


dy nl ~ 

dx m 

dy t 

dy 2 

ax m 

a y n . 


then 
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The matrix or vector integration can be similarly expressed. 
1A.8 Matrix operations 

Matrix operations appear widely in the present text 
explicitly or implicitly. Some of the prime operations are 
listed in the following table 

Table 1A.8 Some common matrix operations 


Matr i x 
ope rations 

Expre s s 1 on 

Typical applications 

Cho 1 e sky 

a t a » ldl t = gg t 

Matr i x inver s 1 on , 

f act ^ r 1 zat ion 
of A t A, A being 
mxn matrix 

L is lower triangular 
D is diagonal 

G is lower triangular 

LS estimation 

S ingu lar value 

A«USV T . U and V are 

Matrix rank determination. 

decompos 1 t ion 

orthogonal matrices, 

i nve r t i on, da t a compression 

of mxn matrix A 

S is d i agona I 

signal decomposition, 
modelling, filtering 
estimation and prediction 

OR decomposition 

A <* OR > Q having 

Matr i x i nvers i on , 

on mxn matrix A 

orthogonal col umn s , 

R is upper triangular 

LS estimation 

QRcp factoriza- 

Q T [ A ] P - R 

Subset selection 

tion of A 

0 having orthonormal 
columns, R is upper 
triangular, P is per- 
mutation matrix 

(first r col umns 
of AP) 

UD f a ctor 1 zat i on 

P - udu t . 

Recursive updation of 

of square symme- 

U is upper triangular 

covariance matrix, 

trie matrix P 

D is diagonal 

Inversion of functions 
of matrices 
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EXPONENTIAL FOURIER SERIES 


The trigonometric Fourier series representation of a 
periodic process f(t), with period T, was discussed in 
Sec.2.5.1. The derivation of exponential Fourier series 
representation is presented here. 

The sinusoidal f unctions can be expressed in terms of 
exponential functions as follows 

e = cos0 + ism0, 


1 , 10 - 10 . 

cos0 = -(e + e ), 


. _ 1 , 10 - 10 . 

sin0 = -(e - e ), 


i = 

So the trigonometric Fourier series (2.5.2) can be expressed 


as 


f(t) = a 0 + i j:{a n (e lnW ° t +e" lnW ° t ) - ib n (e lnW ° t -e' lnW ° t )} > 

2 n=l 

(2A.1) 

where « c = 2tr/T. Introduce the complex coefficients: 

Sn = £^ a n - i^ , n^» g-n ~ 

Substituting for a„ and b n from (2.5.4) in (2A.2) 

t c +T 

g n = ^ J f(t)(cosnw Q t - isinnu> 0 t)dt 


(2A.2) 


t 0 +T 


•f J 


f(t)e' inWot dt. 


where t 0 is arbitrary. For n = 0, 


go 


o 

-u 


f(t)dt = a. 


as in (2.5.4). Hence, from (2A.2) and (2A.1), 
f(t) -a , 0 + l (g n e lnWot + g_ n e” lna>ot ). 

n=l 

That is 

f(t) = I g n e lnWot . 

n=-oo 


(2A.3) 
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Note that the magnitude and the phase of the components of 
the exponential series (2A.3) sure given by 

8n - lgJ el ^ n > ^ 8 - n = lg n l e ^ n - 

Hence 

f(t) = + I <|g„| e """<■* * * |g ri |e~ ltnw< ’ t+ ^ n Y 

n=l 

That is 

00 

f(t) = g Q + E 2 1 g n | cos(n« 0 t+^ n ). (2A.4) 

n=l 

In comparison with (2.5.5), the magnitude |g n | of the 
frequency component nw 0 in the exponential series (2A.4) is 
half of that of the sinusoidal component for the same 
f requency, except f or the average or constant component 
which is the same in both. 

The total power in f(t) within the interval (t 0 ,t 0 +T), 
is given by the Parseval’s Theorem (Papoulis, 1991) as 
t o +T 

f f 2 (t)dt - a 2 + ; E (a£+b 2 ) (2A.5) 

z n=l 

x, o 

= |g»| 2 - 

n=l 
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U-D COVARIANCE MEASUREMENT UPDATE 
3A.1 Covariance Measurement Update 

In sequential state estimation, the covariance of the 
estimation error undergoes two different updates: (i) the 
measurement update, and (ii) the time update. The generic 
expressions for these two updates with respect to the Kalman 
filter state estimator are discussed in Sec.6.6. These 
updates appear in different areas of estimation and control. 

In RLS parameter estimation, at each time instant k, 
the covariance P(k) is to be updated. The covariance 
measurement update produces the Kalman gain k(k) based on 
P(k-l), and then updates P(k-l) to P(k). RLS estimator uses 
k(k) to update the parameter estimates 0(k) as: 

$(k) = £(k-l) + k(k)(y(k) - h T (k$(k-l)), (3A.1) 

where y(k) is the newly available measurement and h(k) is 
the data vector. 

Given the square symmetric covariance matrix P(k-I) and 
h(k), the covariance measurement update computes Kalman gain 
k(k) and the updated covariance matrix P(k): 

k(k) = P(k-l)h(k)U+h T (k)P(k-l)h(k))"\ (3A.2) 

and 

P(k) = [I - k(k)h T (k)]P(k-l). (3A.3) 

Refer to Sec.3.4.1 for further discussions. One of the 
popular approaches for covariance measurement update is the 
U-D factorization; detailed discussion of U-D covariance 
factorization is given in Bierman (1977), and Thornton and 
Bierman (1980). 

Remarks 

(a) With reference to the Kalman filter formulation (6.6.9- 
6.6.13), the RLS parameter estimation is equivalent to the 
state estimation problem for the process 

x(k+l) = x(k), (3A.4) 

y(k) = h T (k)x(k) + e(k), (3A.5) 

where x is the same as the parameter vector 0 in (3A.1). The 
relation (3A.4) renders the covariance time update 
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redundant, and hence RLS estimation requires the covariance 
measurement update only. 

(b) Here k(k) is in fact k(k|k-l), as all information up to 
time (k-1) is used in producing k(k). 


3A.2 U-D Factorization 

The positive semidefinite matrix P can be expressed as 

P - UDU T = [UD 1/2 ][UD 1/2 ] T , (3A.6) 

where D is a diagonal matrix, and U is an upper triangular 
matrix with 1 s on the diagonal; UD 1/Z is the square root of 
P. The factorization (3A.6) is referred to as U-D factori- 
zation of the covariance matrix. So instead of recursively 
updating P, its factors U and D may be updated and propa- 
gated through the recursions. 

U-D f actorization belongs to the class of square-root 
filtering, where instead of the covariance matrix P, its 
square-root f actors are updated. This approach reduces 
round-off errors and increases numerical stability. U-D 
factorization offers the special advantages that 

(a) no explicit square-root extractions are required, and 

(b) D is available for diagnostic purposes. 


3A.3 Fortran mechanization of U-D measurement update 


Bierman (1977, p.100) considers U stored as a matrix; the 
present implementation considers the relevant upper 
triangular part of U stored into a vector U(), typically as 

[1 U(l) U(2) U(4)l 
WI 1 U(3) U(5) 


The present routine produces unweighted Kalman gain 
k'(k) = P(k-l)h(k), and ALPHAJ = (l+h T (k)P(k-l)h(k)) which 
is a scalar; the actual Kalman gain k(k) = k' (k)/ALPHAJ. 


Inputs : 


X() data vector h; remains undestroyed on exit 
U() Prior U factor arranged into a vector 

DO Prior D factor, expressed as a vector 

NPAR Size of X() 

FLAMDA scalar cost A; FLAMDA=1 f or estimation 
y measurement of the dependent variable 

THETAO Parameter vector 0 
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Outputs : K() Unweighted Kalman gain 

ALPHAJ Scalar factor; Kalman gain = KO/ALPHAJ 
U() Updated U factor 

DO Updated D factor. 

THETAO Updated parameter vector 

Measurement update routine 

FJ = X(l) 

VJ = D(1)*FJ 
K(l) = VJ 

ALPHAJ = FLAMDA + VJ*FJ 

Comment: If FLAMDA is zero and if VJ or FJ are also zero, 
ALPHAJ becomes zero, which causes numerical problems at the 
next step. So, if FJ or VJ are likely to be zero, the lowest 
acceptable value f or FLAMDA may be constrained to a very 
small number, say close to the machine zero, instead of 
true zero (e.g., IF(FLAMDA.LT. IE- 14) FLAMDA=1E-14 ) . 

D(l) = D( 1 )*FLAMD A/ALPH A J 
KF = 0 
KU = 0 

DO 4 J « 2.NPAR 
FJ = X(J) 

DO 41 I = l.J-l 
KF = KF+1 

41 FJ = FJ + X(I)*U(KF) 

VJ = FJ * D(J) 

K(J) = VJ 

AJLAST = ALPHAJ 

ALPHAJ = AJLAST + VJ*FJ 

D(J) = ( D( J ) ‘AJLAST )/( ALPH A J*FLAMD A ) 

PJ ■ -FJ/AJLAST 
DO 42 I = l.J-l 
KU = KU+1 
W = U(KU) + K(I)*PJ 
K(I) = K(I) + U(KU)*VJ 

42 U(KU) = W 
4 CONTINUE 

Comment: The following part is required in the case of RLS 
parameter estimation. 

PERR - y 

DO 6 I = 1, NPAR 
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6 PERR = PERR - THETA(I)*X(I) 

DO 8 I - 1,NPAR 

8 THETA(I) = THETA(I) + PERR*K(I)/ALPHAJ 
3A.4 Application Aspects 

Besides RLS parameter estimation, the use of U-D measurement 
update in LQ control (Sec.11.6.3, Equation (11.6.37), and 
Sec.11.6.4), in generalized predictive control (Sec.12.8, 
Eqn.(12.8.3 - 12.8.4)), and in Koopmans-Levin method of 
parameter estimation (Sec.3.5.3, Eqn.(3.5.21)) have already 

been discussed. Further applications of U-D measurement 
update for matrix operations are possible as shown below. 

Example 3A.3 Compute [G G + M Ml t, where G and M are 
nxn matrices and t is a vector, using U-D measurement update 
routine. 

Following (3.4.8 - 3.4.9), RLS estimation involves the 
update 

POc+lf 1 = iP(k)]' 1 + h(k)h T (k). 

X X 

So, (G G + MM] can be computed as follows. Initialize 
P = AI, where A is vanishingly small (say, IE— 14); fill the 
U() array with all zejjos and DO array with A values. If 
M = [»»! ... mi ... m n j, n calls to U-D measurement update, 
with mi (i=l to n) passed as the data vector XO, will 
produce P = UDU = (M M] . Similarly further n calls to 
the U-D routine with each column of G passed as the data 
vector X(), will lead to the update P = [G T G + M T M] \ 

Again the present routine produces 

k' (k) * P(k-l)h(k), 

which can be used next as follows. Define X() as the vector 
t, and run the UD-measurement update once; K() will contain 
the result (G T G + M M] _1 t. 
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QR AND QRcp FACTORIZATION 

The QR decomposition of an mxn matrix A with rank p is given 
by 

A ■ QR 

where Q is an mxp matrix with orthonormal columns and R is 
am pxn upper triangular matrix. When m = n, Q and R are 
square matrices, amd Q is an orthogonal matrix. There are 
three main approaches to QR decomposition: Gram-Schmidt 
orthogonalization. Householder transformation, and Givens 
rotation. The Gram-Schmidt orthogonalization has been used 
in the present discussion. This appendix also presents the 
mechanism for QR with column pivoting (QRcp) factorization. 

QR factorization through Gram-Schmidt orthogonalization 

Starting with mxm matrix A, the Gram-Schmidt ortho- 
gonalization process produces matrix Q = [qj q 2 ... q,,,] 
having orthonormal column vectors, q t (that is q t q 4 = 1, 
for i - 1 to m). 

Consider a 3x3 matrix A: 

... [ 1 1 n 

A — I 9U 3lp 3Lo t 

i i j 

from which 



T 

with Q Q = I, is to be produced. 

The first column vector qj is considered to be spanning 
the same space as a t ; hence 

q x = aj/fl aj || . (3B.1) 

To determine second vector q 2 , start with a 2 and deduct the 
component of a 2 in the direction of the first vector (i.e., 
a x or <ll> as follows: 

T 

= a 2 ” ( < Al a 2^ < ll» 
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on subsequent normalization 

<b- (3B.2) 

To compute q 3 , which will be orthogonal to both q x and q 2 , 
the components of a 3 in the spaces spanned by q t and q 2 have 
to be subtracted from a 3 as follows: 

q 3 ' = »3 - (qi a 3)qi - (« i2»3 

on subsequent normalization 

<b = • (3B.3) 

Thus the Gram-Schmidt orthogonalization of A produces matrix 
Q with orthogonal columns. Hie QR decomposition follows 
naturally f rom the Gram-Schmidt orthogonalization as 
f ollows. 

Denote 

r n = || a i||» 

r 12 = ql a 2. r 22 = llQz'll- 

T T II / II 

r 13 “ *ll a 3» r 23 ~ Q2 a 3* r 33 “ ||Q3 || ■ 

So the relationship between the column vectors of A and Q 
can be rewritten as 


qi = 

r n ’ 



Q2 “ 

a 2 

r i2qi 


r 22 

r 22 

» 

Q3 = 

a 3 

ri 3 qi 

r 23q2 

r 33 

r 33 

r 33 

Again the 

vectors a t 

can be 


(3B.4) 


(3B.5) 


(3B.6) 


f ollows. 


a i = r n qi, 

a 2 = r l 2 qi + r 22Q2» 

a 3 = r 13^1 + r 23 q 2 + 1* 33^3* 
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r i r 


r ll r 12 r 13 

a i a 2 a 3 - qj q 2 q 3 


r 22 r 23 

L J L J 


. 0 r 33. 


Generalizing (3B.4 - 3B.6) 

a J r l J<ll r 2j t l2 r (j-i)j { lj-l 

qj = ~ rjj ~i. ] ’ 

where 

Tjj = c| j a j and 

r u = i (a j - r u<h - r 2jq2 - ••• - r (J — i)qj— 1 ) I • 


Remarks 

(1) For any mxn matrix A = [a t a„], the corresponding 

Q with the orthonormal vectors Q = tq, q n ] are such 

that for r = 1 n, the set (q* q r > collectively 

span the same r dimensional space as the set (a,, ..., a r >. 

(2) The number of nonzero diagonal elements of R 
indicates the rank of the matrix, although the closeness to 
rank-loss cannot be detected through QR f actorization as 
precisely as through SVD. R(j,j) being zero indicates the 
j-th column of A being redundant, as it has no component in 
the qj vector space which is orthogonal to the q t vector 
spaces for i 9* j. 

(3) Although the mechanism of orthogonalization is quite 
transparent in Gram-Schmidt orthogonalization, f rom a numeri- 
cal point of view it is not well conditioned. A numerically 
sound implementation is modified weighted Gram-Schmidt 
algorithm; this as well as other robust implementations are 
detailed in Golub and Van Loan (1989) and Lawson and Hanson 
(1974). 

QRcp Factorization 

QRcp (that is, QR with column pivoting) factorization is used 
to pivote the columns of a matrix in order of maximum 
Euclidian norm in successive orthogonal directions, while QR 
factorization is performed on the matrix. The implementation 
of QRcp factorization has been discussed in Sec.3.6.2. The 
mechanism of the rotation of the columns is discussed here. 
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T 

The column vector of A with maxia! a t ) is first 
selected, and is swapped with a t . q 1( the unit vector in 
the direction of a t is determined using (3B.1). 

The second (or rotated) vector is the one maximizing 

(aj-qJa J q 1 ) T (a J -qIa J q 1 ), 

which is swapped with a 2 , and q 2 is computed as (3B.2). 

At the i-th stage of selection, the rotated vectors 
(aj) are 

a] - aj - (qJa j q 1 +...+q] , _ 1 a J q 1 _ 1 ), 
i = 2 to n, j = i to n, 

*T * 

and the i-th selected vector is the one maximizing aj aj. The 
subsequent rotation within QR . decomposition will be with 
respect to this vector and so on. The selection is continued 
for up to r stages, where r may be the rank of A or may be 
specified based on other considerations (see for example 
Sec.3.6.3). The sequence of successive selections is 
registered in the permutation matrix P. The result is 

Q T AP * R, 

where R is upper triangular. The matrix AP will have the r 
columns of A appearing first in order of importance. 

Ref erences 

[1] Golub, G.H., and C.F. Van Loan (1989): Matrix Computa- 
tions, 2nd edn., The Johns Hopkins Univ. Press, 
Baltimore 

[2] Lawson, C.L., and Hanson, R.J. (1974): Solving Least 
Squares Problems, Prentice Hali, Englewood Cliffs, N.J. 



Appendix 4 


CENTRED MOVING AVERAGE 


The two basic purposes of Centred Moving Averaging (CMA) are 

(i) estimation of trend in a time series without any 
time-lag, and 

(ii) reduction of the effects of random or spurious noise 
associated with the data. 

From the signal processing jjoint of view, CMA is akin to 
smoothing or estimation of y(k) based on the data iy(k-j), 
..., y(k+i)>, i and j being positive integers. 

The general consequence of averaging is low-pass 
filtering (see Appendix 14A), that is the high frequency 
components are attenuated whereas the low frequency compo- 
nents are retained. Any averaging which uses present and the 
past data only will produce a time-lag in the averaged data. 
In CMA, since both past and post data are used, the lag-free 
estimate of the series is produced. Thus CMA is similar to 
bidirectional filtering discussed in Sec.14.3.1. 

In the usual form of CMA, the estimate of the time 
series <y(. )> at time k is given by 


a„ , _ y(k-N)+...+y(k-l)+y(k)+y(k+l)+...+y(k+N) 
yi ; 2N+1 ; 


(4A.1) 


here (2N+1) is the length of the data window, which 
characterizes the CMA. 

The data may be exponentially weighted as follows: 
y(k)= “W-W* • • •+“y(fr~l )+y(k)+«y(k+l)+. . .+« N y(k+N) 
(2N+1) ( l+2(a+a Z +. . . +a N ) ) 


0<a=sl. 

The implication is that more importance is given to the 
value of the data closer to k, the point of estimation. 

If the data series has an inherent periodic component 
of length N, CMA through (4A.1) will be able to get rid of 
the periodicity, when N is odd. If N is even, use the scheme 


A (k)= i j~ p(k-N/2)+„+y(k+N/2-l)j + 1 


y(k-N/2-l)+. .+y(k+N/2) 
N 


)]■ 


(4A.2) 

From the low-pass filtering point of view, CMA with an 
averaging window of N will eliminate any periodic component 
with period length N, or its higher harmonics of period 
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Figure 4A.1 Central moving averaging of the German 
unemployment series with (a) N = 10, (b) N = 12. A 
actual data, — — filtered data (y(k)h 

length N/2, N/3, etc. For an even N, the scheme as in (4A.2) 
ensures y(k) to be truly centred over the time-point k. 

CMA is discussed in detail in Makridakis et al (1983). 

Example 4A.1 Centred moving averaging of the German unem- 
ployment series (Appendix 7E) 

This monthly data series (see Fig. 14.3.3) has a yearly 
periodicity, i.e. the period length concerned is 12. CMA 
using (4A.2) with N=10 and N=12 is shown in Fig.4A.l. As 
expected, unlike the case with N=12, CMA with N=10 fails to 
eliminate the periodic components completely. 

Reference 

[1] Makridakis, S., S.C. Wheelwright, and V.E. McGee 
(1983): Forecasting: Methods and Applications, 2nd. 
edn., John Wiley, New York. 





Appendix 5A 


RECURSION OF THE DIOPHANTINE EQUATION 
5A.1 Problem Statement 

Discrete-time modelling and prediction using transf er- 
f unction models often involve solution of the Diophantine 
equation: 

C(q'*) = Ep(q -1 )A(q _1 ) + q^Fpiq" 1 ), p = 1,2 

where 

. / ~1» . “1 -n 

A(q ) = 1 + ajq + ... + a„q , 

C(q -1 ) *» 1 + Cjq 1 + ... + c n q' n , 

Ep(q _1 ) = 1 + e ( p, 1 q _1 +...+ e (p)p . 1 q' p+1 , 

Fp(q J = f(p)0 + +... + fp(n-l)Q 

it is assumed that A(q -1 ) and C(q -1 ) are known ; the 
objective is to determine Ep(q -1 ) and F p (q -1 ) for different 
values of p recursively. 

5A.2 Recursive Solution 
For p = 1 , 

Ep(q 1 ) = 1 and hence 

q~Vp(q -1 ) = C(q _1 ) - A(q _1 ). 

So, for i = 1 to n, 

= c t - aj. (5A.1) 

For p = 2 , 

Ep(q _1 ) = 1 + e (2)1 q" 1 

C(q _1 ) = A(q _1 ) + q” 1 e( 2 )iA(q _1 ) + q" 2 F p (q -1 ) 

That is 

—1 — j —1 —2 —3 

C(q ) - A(q ) = e( 2 ) 1 q + e( 2 ) 1 a 1 q + e( 2 )!a 2 q + 

-2 -3 

••• + f^lo^ + f (2)i ( I + ••• 
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So 

e (2) 1 = C 1 “ a l = f (l)0> 

^ (2)0 = ®2 ~ a 2 “ e (2)l a l = ^<1)1 “ e (2)l a l» 

f (2)l “ c 3 “ a 3 “ ®(2) l a 2 = f <1)2 “ e (2) l a 2* 

f (2)n-2 = c n “ ®n “ e (2)l a n-l = f (l)n-l “ e (2)l a n 
f (2)n-l = ” e (2)l a n = “ ©tell 3 !! 

(5A.2) 

Thus the parameters of Ep(q X ) and F p (q' 1 ) can be computed 
recursively. 

The general expression for the recursion can be developed as 
follows. Given the parameters of the identity 


C(q _1 ) = Epiq-^Alq" 1 ) + q^Fpiq" 1 ), 

(5A.3) 

the objective is to compute the parameters of E p+1 (q -1 ) 
and F p+1 (q _1 ) in 

C(q 1 ) = Ep +1 (q 1 )A(q" 1 ) +q' p_1 F p+1 (q l ). 

(5A.4) 

Let Epiq* 1 ), Fpiq* 1 ), Ep +1 (q _1 ) and F p+1 (q _1 ) be 
by E, F, E and F respectively for simplicity. 

Subtracting (5A.3) from (5A.4), 

represented 

0 = (£-E)A + q _p (q _l £-F). 

£ being of degree p, 

(5A.5) 

£ - E - E + q" p e„, 

where the degree of E, SE = p-1. 

From (5A.5) and (5A.6), 

(5A.6) 

0 = EA + q" p (e p A + q _1 £ - F). 

(5A.7) 

Since E is of degree p-1 and the 1st term in A is 
Hence from (5A.7), 

q _1 £ = F - e p A, 

or A 

1, E - 0. 

e p = 

(5A.8) 

^l-i = f i “ ®p a i* 1 = 1 5F > 

(5A.9) 

= e p an- 

(5A.10) 
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From (5A.6), also note that 

Ep^lq" 1 ) = Ep(q _1 ) + q' P e (p+1)p , (5A.11) 

that is the A only additional term in Ep +1 (q _1 ) at every 
recursion is e p (i.e. e( p+1 ) p ), given by (5A.8). 

Thus, starting with (5A.1), for p = 1, the parameters 
of Ep +1 (q~ ) and F p+1 (qj can be recursively computed for 
higher values of p using (5A.8) - (5A.10). The recursive 
relationships for p = 2 are shown in (5A.2). 

5A.3 Implementation 

A recursive implementation of Diophantine equation in Matlab 
[1] follows. 

function [E, F] = diophant(C, A, p) 

LC = length(C); 

LA = length! A); 
if LC == LA 

[E, F] = deconv([C zeros(l : p - 1)], A); 
elseif LC < LA 

[E, F] = deconv([C zeros(l : LA - LC + p - 1)], A); 
elseif LC - LA - p + 1 > 0 

[E, F] = deconv([C, (A zeros{l : LC - LA - p + 1)1); 

else 

IE, Fl = deconv({C zeros(l : p - 1 - LC + LA)], A); 

end 

LF = length(F); 

F = F(p + 1 : LF); 

Lmin = p; 
if LA == 1 

Lmin = min(p, LC); 

end 

E = E(1 : Lmin); 


Example: see Example 12.6 in Sec. 12. 6. 

Ref erence 

[1] Matlab matrix software, The MathWorks, Inc., Sherborn, 
MA. 
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PREDICTOR FOR A MULTIVARIABLE PROCESS 


A predictor algorithm formulated for a multivariable 
stochastic process is presented here. 


Problem formulation 

Consider the multi-input multi-output process: 

A(q _1 )y(k) = B{q _1 )u(k-d) + C(q _1 )e(k), (5B.1) 

where y is rxl output vector, u is sxl input vector; <e(k) > 
is a sequence of independent and equally distributed rxl 
random vectors with zero mean and the covariance given by 

E<e(k)e T (k)> = a. 


A, B and C are matrix polynomials: 

A(q _1 ) ■ I + A t q _1 + ... + A n q~ n , 

B(q _1 ) - B 0 + Bjq" 1 + ... + B n q' n , 

C(q _1 ) = I + C^ -1 + ... + C n q" n , (5B.2) 

where At and C t are rxr matrices and are rxs matrices 
for i = 1 to n. It is assumed that the zeros of det.Aiq” ) 
and det.Ciq” 1 ) are strictly inside the unit circle. The 
input vector u(k-d) is defined as, 


r -di 


u(k-d) = 


-d. 


Ujit)' 

u 2 (t) 


U s (t) 


(5B.3) 


Define y(k+pjk) as the p-step ahead predictor of y(k), 
based on all the available information up to time k. If 
p > d t for 1 s i s s, as in (5B.3), the future input stra- 
tegy is assumed to be known. 

The objective is to compute the minimum mean square 
error predictor y(k+p|k) minimizing the scalar cost 

J m = E<e T (k+p)e(k+p)>, (5B.4) 

where the prediction error 


e(k+p) = y(k+p) - y(k+p|k). 


(5B.5) 
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Explicit design 
Introduce the identity 

A~ l (q” 1 )C(q~ 1 ) = C(q -1 )A _1 (q -1 ) 

= E(q _1 ) + q _p F(q~ 1 )A _ 1 (q~ 1 ), (5B.6) 

where 

E(q -1 ) = I + E iq -1 +...+ Ep_iq _pl , (5B.7) 

F(q -1 ) = F 0 + F 1 q" 1 +...+ F^q - "* 1 . (5B.8) 

A/ /V 

Note that (a) the polynomial matrices C(q ) and A(q ) in 
(5B.6) are not unique, and (b) det.C = det.C. From (5B.1), 

y(k+p) = A _1 Bu(k+p-d) + A -1 C e(k+p). 

Using (5B.6), 

y(k+p) = A _1 Bu(k+p-d) + Ee(k+p) + FA _1 e(k). (5B.9) 

Substituting for elk) in (5B.9) from (5B.1), 
y(k+p) = A _1 Bu(k+p-d) + Ee(k+p) + FA~ 1 [C~ 1 Ay(k)-C~ 1 Bu(k-d)], 
= A _1 Bu(k+p-d) + Ee(k+p) + FC _1 yCt) - FA _1 C _1 Bu(k-d), 
- FC -1 y(k) + [A -1 - q~ p FA -1 C” 1 ]Bu(k+p-d ) + Ee(k+p). 
Using (5B.6), 

y(k+p) = FC -1 y(k) + EC" Wk+p-d) + Ee(k+p) (5B.10) 

= y(k+p|k) + e(k+p), by definition. 

Since Ee(k+p) is orthogonal to the other terms on the right- 
hand-side of (5B.10) the optimal predictor is given by 

y(k+p|k) = FG(q’ l )C" 1 (q~ 1 )y(k)+E(q~ 1 )C" 1 (q~ 1 )B(q -1 )u(k+p-d). 

If C(q X ) = I, C(q = I and A(q X ) = Alq" 1 ) as particular 

solutions, and the p-step ahead predictor becomes 

y(k+p|k) = F(q _I )y(k) + E(q -1 )B(q -1 )u(k+p-d). 

e(k+p), the prediction error corresponding to the optimal 
predictor is given by 

e(k+p) = E(q -1 )e(k+p) = e(k+p) + E 1 e(k+p-l) +...+ E p _ 1 e(k+1). 

Remark: Similarly, the implicit prediction scheme as well as 
the scheme for prediction by recursions through the process 
model (as in Sec. 5. 4) may also be developed for a 
multivariable process. 



Apppendix 6 


THE COVARIANCE MATRIX FOR p-STEP PREDICTOR 

The covariance matrix P(k+p|k) is indicative of the degree 
of confidence that the estimator has in x(k+p|k), the p-step 
ahead prediction of the state x(k). The derivation of 
P(k+p|k) is presented in this appendix. 

Since the multiple prediction (6.7.5) is obtained by 
recursion through the time-update stage alone, the 
prediction error is largely dependent on P(k|k) and the 
process noise covariance Q. 

The prediction error sequence, 

(x(k+p|k), p = k+1, k+2, ...}, 

is a Gauss-Markov process as shown below. Rewriting (6.7.6), 
the prediction error 

x(k+p|k) = A p x(k|k) + f A m ’ i Sw(i), (6A.1) 

l=k 

where m = k+p-1. 

Since the zero mean noise w possesses the property 
(6.6.5) the process (x(k+p|k) in (6A.1) is a zero mean 
discrete-time Gaussian process: 

x(k+p-l|k) = A p_1 x(k|k) + ? A m_1 Sw(i). 

i=k 

Hence 

x(k+p|k) = Ax(k+p-l|k) + Sw(k+p-l), (6A.2) 

which is a Markov property and hence (x(k+p|k)), the 
prediction error process is a Gauss-Markov process. 

The prediction error covariance matrix 

P(k+p|k) = E[x(k+p|k)x(k+p|k) T ]. (6A.3) 

From (6A.1) and (6A.3), 

P(k+p-l | k) = E[A p x (k|k)(A p x(k|k)) T ] 

+ E( l A m_1 Sw(i)( l A m l Sw(i) T ] 

i=k 1 =k 

+ E[A p x(k 1 1)( l A m_l Sw(i)) T ] 

i=k 

+ E[ A m i Sw(i)(A p x(k I k)) T ]. (6A.4) 

i=k 
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The last 2 terms in (6A.4) will disappear, because 

E[x(k|k)w T (i)] = £[x(k)w T (i)] - E[x(k|k)w T (i)l, 

where 


(6A.5) 


E[x(k)w T (i)J = 0, i a k, 

following (6.28b), and x(kjk) can be expressed as a linear 
combination of output measurement, which is uncorrelated 
with the noise, and hence 

E[x(k|k)w T (i)l = 0. 


Using (6.6.5), (6A.4) can be expressed as 


P(k+p|k) = A p E[x(k|k)x(k|k) T ](A p ) T 

+ £ A m ' 1 SE[w(i)w(i) T ]S T (A m " i ) T , 

1 =k 
is 

P(k+p|k) = A p P(k|k)(A p ) T + i | k A m " i SQ(i)S T (A m ' i ) T . 
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DETAILS OF SELECTED EXAMPLES OF CHAPTER 7 

The supportive details of the example presented in Sec.7.5.2 
and Example 7.7(2) are given in this appendix. 

7A.1 The Electrical Power Load Problem: Sec.7.5.2 

It is assumed that 24-hourly data for 10 consecutive Mondays 
are available; here the first 10 sets of data from Appendix 7D 
are used for modelling. The objective is to produce 
prediction of the load for the following Monday. 

The available 10x24 data set is appended with 8 columns 
of 'zeros to form the 10x32 data matrix A, which is 
WH-transf ormed to A„. The values of the cumulative square of 
each element of each column of A w is computed, which are as 
follows (for column-1 to column-32): 


(365165 . 5, 

5.8, 

15.2, 31.7, 

72.5, 

51.4, 

64.8, 

42.3, 29848.9, 7.6, 

75.2, 

57.9, 

158 . 0, 

9.0, 

30.4, 136.5, 

29049.9, 

9.4, 

85.0, 

27.2, 

142.2, 62.9, 

256 . 0, 

188 . 8, 

68240 . 6, 

10.9, 

21.9, 17.5, 

64.9, 

182 . 0, 

339 . 8, 

70.0) x 10 

3 



The series 

of elements 

each of the relatively dominant 11 

columns of 

A„ are modelled using (7.5.1) 

and the estimated 

parameters are given by Table 7A.1. 



Table 7A.1 

Estimated parameters 



Column 

Parameters of series of column elements 


No. 

fo 

u 

f 2 


1 

3191.580 

0.660 

-0. 189 


9 

590.266 

0.920 

-0.265 


17 

2289.569 

0. 162 

-0.500 


25 

-2384.004 

0.092 

-0.01 1 


13 

-190.700 

-0.025 

-0.533 


16 

172.738 

-0.517 

0.041 


21 

80.044 

-0.057 

0.350 


23 

114.952 

0.771 

-0.516 


24 

-232.961 

-0.470 

-0.251 


30 

121.936 

0.754 

-0.638 


31 

288.721 

0. 145 

-0.754 
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For each of the 11 columns of A w , the 11th column element is 
predicted using (7.5.2). For the other columns of A w , the 

predicted 11th element is assumed to be the same as the 10th 
element. The predicted 11th row of A w so formed is reverse 
transf ormed to produce the prediction of the load of the 

11th Monday, which is shown in Fig. 7. 5.1. 

7A.2 Air Traffic Problem: Example 7.7(2) 

This series is on the number of Trans-Atlantic Airline 

passengers (in thousands), is taken from Brown (1963, 

p.429). 

The monthly data for the 12 consecutive years (1949 - 
1960) are arranged in the 12 consecutive rows of matrix X: 


'112 

118 

132 

129 

121 

135 

148 

148 

136 

119 

104 

118 ' 

115 

126 

141 

135 

125 

149 

170 

170 

158 

133 

114 

140 

145 

150 

178 

163 

172 

178 

199 

199 

184 

162 

146 

166 

171 

180 

193 

181 

183 

218 

230 

242 

209 

191 

172 

194 

196 

196 

236 

235 

229 

243 

264 

272 

237 

211 

180 

201 

204 

188 

235 

227 

234 

264 

302 

293 

259 

229 

203 

229 

242 

233 

267 

269 

270 

315 

364 

347 

312 

274 

237 

278 

284 

277 

317 

313 

318 

374 

413 

405 

355 

306 

271 

306 

315 

301 

356 

348 

355 

422 

465 

467 

404 

347 

305 

336 

340 

318 

362 

348 

363 

435 

491 

505 

404 

359 

310 

337 

360 

342 

406 

396 

420 

472 

548 

559 

463 

407 

362 

405 

417 

391 

419 

461 

472 

535 

622 

606 

508 

461 

390 

432 


A 4x12 data window A(k) is assumed to move over X. So A(l) 
is composed of the 1st 4 rows of X, A(2) is composed of the 
2nd to 5th rows of X and so on. SVD of A(l) to A(9) are 
computed. The singular values are shown in Table 7A.2. 


Table 7A.2 Singular values ( s t (k) ) for A(l) to A(9) 


s t ( k ) 

k 

1 

2 

3 

4 

5 

6 

7 

8 

9 

S 1 

1120.1 

1294.5 

1461.3 

1664.7 

1902.2 

2160.7 

2399.4 

2650.4 

2914.3 

s 2 

20.9 

33.8 

34.1 

41.7 

41.8 

23.8 

39.6 

32.7 

40.1 

s 3 

15.9 

15.7 

27.7 

27.9 

16.5 

13.7 

17.2 

30.0 

31.5 

*4 

11.7 

15.0 

15.0 

10.6 

10.6 

11.2 

11.1 

10.4 

21.8 


Ref erence 

[1] Brown, R.G. (1963): Smoothing, Forecasting , and 
Prediction of Discrete Time Series, Prentice-Hall, 
Englewood Cliffs, New Jersey. 
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DATA ON OZONE COLUMN THICKNESS 


The atmospheric ozone column thickness in the atmosphere 
measured at Arosa, Switzerland, is reproduced here from [11. 


Table 7B.1 Monthly Ozone column thickness in Dobson units 


Year 

Jan 

Feb 

Mar 

Apr 

May 

June 

July 

Aug 

Sept 

Oct 

Nov 

Dec 

1931 














311 

335 

283 

286 

301 

1932 

318 

347 

370 

394 

360 

347 

334 

299 

292 

287 

293 

281 

1933 

357 

364 

399 

382 

3 90 

374 

335 

319 

309 

312 

311 

337 

1934 

334 

321 

392 

358 

365 

355 

328 

321 

282 

287 

291 

297 

1935 

332 

390 

367 

383 

375 

319 

331 

311 

288 

275 

299 

313 

1936 

329 

393 

398 

384 

373 

352 

328 

315 

303 

310 

298 

307 

1937 

347 

352 

395 

382 

365 

349 

324 

323 

301 

283 

280 

355 

1938 

337 

370 

325 

392 

384 

336 

325 

325 

296 

280 

285 

299 

1939 

320 

341 

385 

347 

382 

339 

331 

313 

286 

304 

284 

309 

1940 

387 

400 

418 

430 

403 

388 

346 

323 

310 

292 

302 

338 

1941 

362 

395 

417 

409 

4 17 

361 

348 

336 

306 

299 

309 

309 

1942 

400 

422 

373 

408 

376 

347 

325 

309 

284 

272 

298 

313 

1943 

338 

341 

385 

363 

348 

352 

336 

303 

291 

292 

303 

321 

1944 

300 

365 

385 

360 

349 

351 

319 

306 

290 

293 

298 

320 

1945 

377 

359 

360 

373 

376 

351 

329 

327 

297 

288 

295 

313 

1946 

336 

352 

380 

361 

355 

344 

318 

307 

276 

291 

297 

316 

1947 

383 

397 

393 

369 

361 

347 

334 

324 

307 

296 

278 

312 

1948 

341 

371 

348 

374 

353 

345 

345 

311 

299 

281 

286 

321 

1949 

332 

365 

378 

357 

371 

354 

335 

321 

284 

272 

296 

292 

1950 

352 

365 

365 

382 

374 

354 

322 

316 

292 

288 

287 

340 

1951 

338 

402 

417 

397 

383 

364 

332 

321 

297 

298 

278 

311 

1952 

378 

384 

411 

386 

385 

359 

341 

320 

317 

297 

302 

332 

1953 

335 

375 

373 

383 

382 

359 

326 

317 

293 

280 

266 

— 

1954 

— 

— 

373 

415 

389 

362 

348 

329 

305 

285 

291 

284 

1955 

315 

375 

399 

374 

361 

351 

339 

332 

300 

292 

278 

317 

1956 

341 

402 

381 

395 

365 

360 

327 

308 

287 

284 

286 

312 

1957 

340 

342 

353 

375 

380 

349 

330 

321 

305 

279 

294 

322 

1958 

361 

351 

411 

417 

369 

369 

349 

330 

306 

312 

305 

326 

1959 

369 

367 

364 

390 

389 

373 

342 

330 

313 

294 

297 

328 

1960 

349 

397 

405 

400 

382 

353 

339 

315 

306 

299 

284 

332 

1961 

368 

333 

338 

365 

379 

349 

343 

322 

296 

288 

301 

304 

1962 

352 

362 

428 

400 

363 

349 

338 

302 

296 

271 

298 

309 

1963 

371 

408 

377 

381 

378 

352 

327 

310 

282 

273 

278 

293 

1964 

301 

335 

347 

378 

363 

330 

326 

323 

299 

288 

274 

310 

1964 

332 

390 

383 

385 

387 

345 

337 

318 

305 

275 

278 

305 

1966 

354 

338 

401 

380 

372 

— 

— 

326 

307 

282 

321 

319 

1967 

361 

362 

354 

374 

349 

358 

325 

317 

296 

273 

267 

304 

1968 

342 

383 

376 

379 

349 

351 

337 

335 

308 

275 

270 

331 

1969 

327 

419 

361 

393 

351 

364 

333 

333 

294 

281 

307 

345 

1970 

331 

417 

4 14 

419 

389 

360 

331 

323 

291 

278 

271 

309 

1971 

344 

349 

411 

364 

358 

358 

336 

309 

317 

278 

292 

306 


Ref erence 

[1] Bloomfield, P. (1985): ‘Ozone column’, in Data, D.F. 
Andrews and A.M. Herzberg (Eds.), Springer-Verlag, New 
York, 75-76.-76. 
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Appendix 7C 


DATA ON ATMOSPHERIC CONCENTRATION OF CARBON DIOXIDE 


The atmospheric concentration of carbon dioxide in parts per 
million for 22 consecutive years from 1959 to 1980 measured 
at the Mount Mauna Loa observatory in Hawaii is given 
in Table 7C.1; the data are presented row-wise for each year 
(from January to December) starting from 1959. The series is 
extracted from [lj. 

Table 7C.1 Atmospheric cone, of carbon dioxide in ppm. 


Jan. 

Feb. 

March 

April 

May 

June 

July 

August 

Sept. 

Oct. 

Nov. 

Dec. 

315.16 

315.97 

316.37 

317.40 

317.96 

317.82 

316.23 

314.54 

313.60 

313.03 

314.57 

315.32 

316.10 

316.68 

317.37 

318.79 

319.63 

319.29 

317.86 

315.55 

313.85 

313.64 

314.61 

315.81 

316.54 

317.34 

318.12 

319.06 

320.20 

319.44 

318.24 

316.52 

314.57 

315.13 

315.75 

316.73 

317.70 

318.29 

319.37 

320.25 

320.84 

320.43 

319.35 

317.13 

316.01 

315.19 

316.42 

317.47 

318.45 

318.82 

319.72 

321.06 

321.87 

321.22 

319.44 

317.48 

315.89 

315.83 

316.72 

317.98 

319.17 

— 

— 

— 

322.08 

321.92 

320.42 

318.58 

316.68 

316.65 

317.60 

318.49 

319.32 

320.36 

320.82 

322.06 

322.17 

321.95 

321.20 

318.81 

317.82 

317.37 

318.93 

319.09 

319.94 

320.98 

321.81 

323.03 

323.36 

323. 1 1 

321.65 

319.64 

317.86 

317.25 

319.06 

320.26 

321.65 

321.81 

322.36 

323.67 

324.17 

323.39 

321.93 

320.29 

318.58 

318.60 

319.98 

321.25 

321.88 

322.47 

323.17 

324.23 

324.88 

324.75 

323.47 

321.34 

319.56 

319.45 

320.45 

321.92 

323.40 

324.21 

325.33 

326.31 

327.01 

326.24 

325.37 

323.12 

321.85 

321.31 

322.31 

323.72 

324.60 

325.57 

326.55 

327.80 

327 80 

327.54 

326.28 

324.63 

323.12 

323. 1 1 

323.99 

325.09 

326. 12 

326.61 

327.16 

327.92 

329 14 

328.80 

327.52 

325.62 

323.61 

323.80 

325.10 

326.25 

326.93 

327.83 

327.95 

329.91 

330.22 

329.25 

328. 1 1 

326.39 

324.97 

325.32 

326.54 

327.71 

328.73 

329.69 

330.47 

331.69 

332.65 

332.24 

331.03 

329.36 

327.60 

327.29 

328.28 

328.79 

329.45 

330.89 

331.63 

332.85 

333.28 

332.47 

331.34 

329.53 

327.57 

327.57 

328.53 

329.69 

330.45 

330.97 

331.64 

332 87 

333.61 

333 55 

331 90 

330.05 

328.58 

328.31 

329.41 

330.63 

331.63 

332.46 

333.36 

334.45 

334 82 

334.32 

333.05 

330.87 

329.24 

328.87 

330.18 

331.50 

332.81 

333.23 

334.55 

335.82 

336.44 

335.99 

334.65 

332.41 

331.32 

330.73 

332.05 

333.53 

334.66 

335.07 

336.33 

337.39 

337.65 

337.57 

336.25 

334.39 

332.44 

332.25 

333 59 

334 76 

335.89 

336.44 

337.63 

338.54 

339.06 

338.95 

337.41 

335.71 

333.68 

333.69 

335.05 

336.53 

337.81 

338.16 

339.88 

340.57 

341.19 

340.87 

339.25 

337.19 

335.49 

335.51 

336.63 

337.74 


Ref erence 

111 Keeling, C.D., R.B. Bacastow, and T.P. Whorf (1982): 
‘Measurements of the concentration of Carbon Dioxide 
at the Mauna Loa Observatory, Hawaii’, Carbon Dioxide 
Review 1982, Ed. W.C. Clarke, Oxford Univ. Press, 
Oxford, 377-384. 
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Appendix 7D 


DATA ON ELECTRICAL POWER LOAD ON A SUBSTATION 

The electrical power load on a substation for 25 consecutive 
Mondays of the year 1983 are presented here. The hourly data 
for arranged row-wise in the following table for the 
consecutive Mondays. 


Table 7D.1 Power load in MWH 


Hours: 

01 02 

03 

04 

05 

06 

07 

08 

09 

10 

11 

12 

139 

131 

122 

123 

139 

157 

188 

239 

256 

265 

213 

229 

152 

149 

143 

147 

156 

166 

199 

258 

272 

271 

256 

241 

148 

144 

141 

142 

148 

170 

208 

257 

265 

258 

255 

223 

144 

139 

135 

133 

140 

162 

195 

262 

270 

270 

255 

238 

158 

146 

140 

141 

144 

165 

21 1 

270 

279 

282 

270 

257 

154 

150 

145 

151 

157 

171 

215 

257 

278 

273 

254 

242 

163 

154 

152 

152 

155 

175 

222 

251 

279 

269 

269 

249 

164 

156 

153 

143 

161 

180 

207 

250 

278 

276 

260 

239 

178 

169 

166 

112 

172 

183 

221 

256 

270 

266 

262 

252 

199 

195 

180 

188 

180 

196 

223 

256 

269 

269 

279 

262 

205 

194 

185 

176 

187 

199 

229 

251 

267 

282 

263 

253 

21 1 

196 

186 

187 

191 

201 

234 

247 

285 

272 

333 

279 

223 

213 

207 

196 

199 

212 

225 

249 

259 

281 

274 

271 

227 

219 

216 

211 

213 

209 

216 

222 

230 

223 

219 

210 

223 

217 

206 

197 

199 

203 

21 1 

238 

268 

272 

283 

271 

228 

209 

212 

205 

200 

213 

213 

233 

266 

274 

287 

276 

239 

230 

225 

203 

202 

205 

208 

230 

256 

276 

277 

281 

235 

228 

218 

209 

210 

215 

217 

246 

258 

287 

282 

286 

229 

213 

211 

209 

208 

21 1 

220 

230 

261 

276 

252 

278 

225 

217 

214 

196 

205 

210 

209 

219 

245 

273 

279 

275 

220 

213 

207 

203 

203 

207 

210 

226 

252 

276 

292 

272 

210 

206 

197 

211 

214 

213 

216 

228 

253 

277 

277 

274 

220 

217 

215 

214 

212 

215 

210 

219 

252 

275 

275 

272 

214 

209 

205 

204 

202 

209 

218 

234 

257 

289 

289 

279 

210 

200 

199 

197 

199 

209 

224 

253 

279 

290 

285 

279 


(Contd.) 
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Table 7D.1 Power load in MWH (Contd.) 


Hours: 

13 14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

218 

205 

195 

198 

197 

214 

253 

237 

273 

245 

199 

158 

220 

209 

208 

21 1 

218 

213 

262 

297 

27 2 

254 

220 

180 

214 

206 

201 

205 

209 

209 

255 

279 

274 

254 

207 

167 

217 

208 

209 

218 

220 

218 

256 

281 

276 

25 4 

220 

186 

226 

223 

228 

225 

217 

224 

253 

288 

286 

270 

217 

185 

220 

223 

227 

229 

230 

228 

261 

283 

279 

266 

223 

192 

234 

222 

222 

218 

223 

217 

259 

299 

283 

273 

235 

192 

221 

215 

21 1 

208 

212 

207 

239 

287 

266 

247 

199 

174 

234 

237 

240 

237 

233 

226 

251 

302 

294 

283 

248 

203 

245 

247 

246 

249 

243 

230 

262 

305 

297 

294 

263 

225 

239 

248 

241 

239 

226 

237 

251 

289 

302 

280 

265 

226 

256 

259 

273 

272 

272 

256 

260 

307 

315 

299 

256 

234 

256 

257 

263 

282 

253 

248 

248 

320 

306 

288 

286 

237 

212 

197 

209 

199 

196 

194 

215 

267 

283 

275 

253 

219 

259 

261 

264 

266 

254 

241 

252 

302 

298 

297 

283 

254 

266 

265 

273 

276 

272 

247 

267 

305 

31 1 

320 

281 

257 

265 

266 

278 

264 

265 

241 

248 

303 

310 

300 

284 

259 

292 

243 

275 

274 

266 

242 

242 

302 

310 

31 1 

280 

251 

273 

260 

268 

266 

259 

247 

212 

296 

313 

307 

276 

251 

255 

265 

266 

267 

258 

244 

243 

292 

304 

307 

277 

239 

263 

268 

270 

256 

253 

232 

229 

288 

297 

297 

267 

241 

264 

263 

267 

264 

253 

236 

236 

257 

302 

296 

276 

242 

269 

253 

268 

271 

265 

240 

231 

273 

299 

294 

275 

237 

275 

268 

278 

273 

257 

260 

262 

298 

306 

306 

275 

232 

257 

256 

264 

264 

261 

240 

258 

300 

318 

281 

265 

234 



Appendix 7E 


DATA ON UNEMPLOYMENT IN GERMANY 


The monthly figures on the number of people unemployed in 
Germany during the period 1948 to 1978 are given below. The 
data have been reproduced from [1]. The monthly data for 
each year are presented row-wise. 


Table 7E.1 Unemployment figures in Germany 


Jan 

Fab. 

March 

Aprl 1 

to? 

June 

July 

August 

Sept . 

Oet . 

Nov . 

Dec . 

481971 

476353 

471803 

469382 

446943 

451091 

665016 

784232 

784126 

739423 

715128 

759623 

962866 

1068885 

1168127 

1232381 

1256889 

1383302 

1302857 

1308091 

1313691 

1316572 

1383832 

1558469 

2200486 

2288368 

2155962 2074220 

1942134 

1808534 

1739507 

1635604 

1566588 

1508348 

1595491 

1976461 

2113553 

1948422 

1850960 

1736166 

1673661 

1611908 

1584067 

1543866 

1502799 

1476741 

1570796 

1931002 

2106836 

2172973 

1848101 

1728250 

1602178 

1534867 

1431499 

1372614 

1309563 

1276009 

1496764 

1955635 

2081227 

2060651 

1631613 

1479538 

1400709 

1312201 

1237767 

1186715 

1148914 

1169558 

1331378 

1747757 

2217243 2275347 

1629817 

1473474 

299533 

1198475 

1109003 

1042697 

982184 

977389 

1117323 

1464489 

1975159 

2000102 

1578827 

1047886 

876414 

790579 

692089 

630088 

610855 

627627 

728116 

1185850 

1390335 

1982469 

1158631 

754433 

653274 

586675 

529246 

503216 

501687 

516321 

744959 

1202533 

1601562 

1222789 

804115 

690039 

587952 

544987 

467085 

435506 

436205 

435751 

557143 

1320321 

1533557 

1418192 

1201913 

678161 

556518 

481151 

422220 

393066 

387484 

421322 

497959 

1022634 

1445508 

1203311 

667087 

466799 

386004 

314389 

258048 

235253 

223206 

235428 

273606 

512409 

684297 

581440 

297704 

225051 

184564 

162558 

140701 

132931 

130861 

142001 

158424 

302812 

422744 

321910 

187840 

153046 

130387 

1 15126 

107704 

111304 

107883 

112935 

127792 

239975 

286398 

273789 

205467 

135430 

109403 

97466 

93939 

91344 

91383 

101984 

131356 

232653 

410047 

416889 

216323 

143659 

124037 

112083 

106111 

104243 

104507 

114335 

133024 

252329 

337497 

304690 

227188 

146634 

126654 

112166 

105399 

102835 

100266 

111462 

126844 

202086 

286334 

291236 

200978 

126862 

106541 

95419 

89018 

85677 

84974 

92231 

118962 

177908 

268848 

235816 

141428 

121288 

107743 

100697 

101476 

105743 

112726 

145804 

216382 

371623 

621156 

673572 

576047 

501303 

458461 

400773 

377235 

359473 

341078 

360846 

395004 

526218 

672617 

589707 

459853 

330851 

264674 

226552 

202689 

187778 

174467 

180223 

196056 

266372 

368585 

374124 

243212 

155181 

122967 

110744 

108016 

103753 

100477 

107770 

118849 

192174 

286266 

264080 

197784 

120550 

103407 

94767 

98562 

99460 

97338 

1 10849 

129476 

17SOS8 

286171 

254753 

206472 

160356 

142890 

135157 

141975 

145835 

146740 

170111 

207990 

269810 

375564 

368952 

268461 

231219 

208289 

190224 

196774 

198266 

194660 

214880 

235379 

279237 

356352 

347053 

286576 

240734 

211276 

200950 

216616 

221905 

219271 

266969 

331839 

485631 

620494 

620154 

561762 

517365 

456965 

450684 

490894 

527051 

556981 

672312 

799337 

945916 

1154295 

1183501 

1114048 

1087078 

1017716 

1002135 

1035235 

1031122 

1005495 

1061128 

1114190 

1223396 

1350990 

1346723 

1190159 

1093693 

953538 

921037 

944609 

939528 

896701 

943685 

984699 

1089935 

1248918 

1213741 

1084229 

1039228 

946498 

930974 

927624 

963468 

911239 

954376 

1004325 

1090708 

1213498 

1224309 

1098969 

1000429 

912997 

877319 

922230 

923963 

864274 

901636 

927043 

1006724 

1171353 

1134060 

957711 

875452 

775117 

763173 

803653 

798667 

736809 

761724 

798973 

866783 

1036519 

992520 

875909 

825374 

766768 

781396 

853077 

864519 

822565 

888100 

967533 

1118500 


Reference 

[1] Subba Rao, T., and M.M. Gabr (1984): An Introduction to 
Bispectral Analysts and Bilinear Time Series Models, 
Lecture Notes in Statistics, No. 24, Springer-Verlag, 
New York, 240-243. 
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Appendix 7F 


DATA ON HOMOGENEOUS INDIAN RAINFALL 

In India, the summer monsoon rainfall shows considerable 
spatial variability. The data presented here concern 
spatially coherent rainf all pattern over the north-western 
and central parts of Indian covering about 557. of the total 
area of the country. The monthly rainfall data at 14 
meteorological sub-divisions over the years 1871-1990 have 
been used by Parthasarathy, Rupa Kumar and Munot (1993) to 
prepare the homogeneous rainfall data set. The data from 
1940 to 1990 are extracted and presented here. 


Table 7F.1 Monthly rainfall in mm 


Jan. 

Feb. 

Mar . 

Apr. 

May 

J une 

July 

Aug. 

Sept. 

Oct . 

Nov . 

Dec . 

11.0 

10 . 9 

10.2 

11.1 

21.4 

146.8 

274.5 

268.6 

66.1 

49.3 

13.6 

7.6 

17.8 

11.1 

6 . 1 

3.2 

10.5 

90.5 

218.0 

175.9 

96.6 

19.2 

0.7 

1.7 

11.9 

30.7 

1.6 

10.6 

9.9 

137.0 

360.4 

275.8 

130.5 

7.8 

0.7 

14.7 

32.2 

1.8 

1.2 

9 . 1 

32.1 

122.4 

300.7 

137.7 

187.7 

64.7 

3.9 

0 

12.4 

23.2 

43.3 

9.0 

8.1 

96.9 

375.4 

320.8 

111.1 

64.2 

6.6 

1.3 

19.9 

1.0 

0.6 

19.6 

8.4 

147.2 

331.5 

212.3 

202.1 

28.6 

3.0 

0.3 

— 

11.1 

1 . 1 

13.3 

16.1 

192.7 

251.3 

291.5 

102.4 

17.5 

64.8 

10.9 

15.4 

10.4 

6.6 

14.8 

6.7 

62.3 

256.2 

309.7 

236.2 

24.6 

3.0 

11.7 

27.4 

9.2 

6.6 

7.9 

5.9 

113.6 

253.3 

233.1 

161.3 

27.9 

65. 1 

1.0 

1.4 

5.8 

1.3 

4.6 

31.0 

95.2 

279.7 

184.8 

228.3 

62.7 

1.9 

0.4 

4.5 

12.6 

15.6 

1.6 

9.7 

60.6 

352.3 

170.7 

195.4 

18.9 

4.9 

4.8 

4.3 

1.8 

22.6 

14.5 

17.4 

100.8 

222.9 

194.4 

82.8 

49.2 

14.3 

0 

1.2 

9.7 

5.4 

7.9 

17.4 

117.8 

289.0 

207.7 

71.2 

28.4 

0.2 

3.0 

15.6 

1.2 

0 

12.3 

2.5 

113.4 

252.3 

302.1 

132.2 

55.9 

0 

0.6 

5.6 

12.8 

7 . 1 

5.5 

6.3 

95.4 

285.6 

193.0 

279.1 

22.5 

0.3 

2.8 

13.9 

1.3 

3.7 

7. 1 

17 . 1 

156.0 

160.9 

333.9 

221.9 

119.0 

3.0 

0.2 

4.9 

3.6 

6.2 

4.2 

44.1 

139.5 

388.8 

235.6 

135.3 

102.2 

29.3 

4.6 

14.8 

2.6 

31.3 

14.4 

18.6 

105.4 

245.2 

261.8 

81.4 

38.6 

8.9 

2 . 1 

4.4 

6.0 

7.8 

11.0 

14.7 

86.4 

308.0 

244.9 

214.1 

59.5 

19.6 

3 . 1 

12.7 

3.6 

0.8 

8.9 

21.5 

123.8 

357.7 

255.4 

234.8 

76.4 

9.8 

0.3 

13 . 1 

0 

19.0 

5.5 

20.7 

131.4 

232.6 

234.7 

98.8 

44.6 

4.5 

3.0 

12.8 

13.1 

2.2 

8.2 

30.7 

143.4 

328 . 1 

249.0 

265.5 

85.5 

5.9 

4.5 

4.3 

7 . 1 

11.9 

18.3 

20.0 

55.4 

258.5 

194. 1 

208.3 

20.8 

13.7 

32 . 9 

1.9 

5.6 

8.7 

13.2 

13.5 

112.7 

207.0 

310.3 

126.9 

51.7 

12.0 

1.1 
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466 Appendix 7F Data on Rainfall in India 


Table 7F.1 Monthly rainfall in mm (contd.) 


Jan. 

Feb. 

Mar . 

Apr. 

May 

June 

July 

Aug. 

Sept . 

Oct. 

Nov. 

Dec. 

0.7 

3.0 

3. 1 

4.4 

10.6 

125.5 

267.8 

275.8 

164.4 

36.3 

5.5 

1.3 

7.3 

3.2 

7. 1 

10.7 

8.8 

67.8 

270.4 

148.7 

110.9 

8.7 

2.4 

7.4 

9.2 

4.8 

3.9 

4.6 

26.8 

102.4 

232.3 

157.8 

122.9 

11.4 

27.8 

10.0 

0.8 

0.5 

53. 1 

7.6 

5.9 

124.4 

297.3 

230.4 

145.2 

16.4 

1.6 

53.7 

11.6 

12. 1 

16.1 

9.6 

5.5 

69.9 

256. 1 

169.5 

113.8 

31.2 

4.8 

2. 1 

3.8 

2. 1 

2.5 

6.0 

19.2 

75.2 

288. 1 

211.5 

167.5 

14.8 

24. 1 

2.5 

13.8 

12.3 

10.7 

6.7 

21.1 

196.3 

192.6 

330.0 

187.8 

23.6 

0. 1 

0 

7.5 

5.7 

5.2 

9.7 

38.0 

196.0 

225.4 

210.6 

137.0 

55.9 

0.3 

0 

1.0 

5.8 

0.8 

6.4 

6.7 

96.8 

150.0 

215.8 

73.2 

15.3 

21.7 

1.7 

2.2 

9.4 

1.0 

2.5 

10.5 

85.3 

294.8 

333.8 

164.1 

79.9 

2.4 

4.2 

0.3 

2.8 

2.3 

4.9 

31.9 

72.3 

205.2 

203.6 

82.0 

94.3 

3.3 

2.8 

5.9 

7.3 

7.9 

1.8 

10.5 

149.4 

260.0 

273.1 

228.7 

100.8 

2.6 

0. 1 

6.2 

4.3 

3.9 

10. 1 

11.8 

125.3 

287.3 

273.3 

125. 1 

3. 1 

39.2 

0.8 

6.7 

3.0 

4.9 

14.8 

27.3 

178.7 

291.8 

225.7 

134.8 

34.2 

42.5 

3.5 

6.8 

20.3 

14.6 

11.9 

15.0 

174.3 

276.7 

277.4 

100.7 

23.6 

23.0 

14.2 

16.9 

27.8 

6.2 

3.8 

26.3 

123.6 

179.6 

230.9 

103.6 

16.0 

68.6 

3.6 

3.4 

1.6 

6. 1 

9.2 

6.0 

202.0 

235.8 

252.9 

94.9 

7.9 

3.6 

19.4 

12.3 

2.0 

15.5 

4.8 

16.1 

111.2 

253.9 

211.8 

177.1 

29.4 

26.2 

5.4 

24. 1 

6.9 

14.8 

17.9 

39.8 

79.6 

219. 1 

249.1 

89.0 

32.4 

39.9 

2.8 


Ref erence 

[1] Parthasarathy, B., K. Rupa Kumar, and A. A. Munot (1993): 
‘Homogeneous Indian monsoon rainfall: variability and 
prediction’, Proc. of Indian Acad, of Science (Earth 
Planet Sci.), 102(1), March, 121-155. 






Appendix 8A 


DATA ON YEARLY AVERAGED SUNSPOT NUMBERS 

The count the number of spots on the sun’s surface is of 
interest in Astronomy and Climatology for geo-physical 
reasons; the series is also of interest to time series 
analysts for the time-varying nature of the series. The 
daily observations from more than 50 observatories are used 
to arrive at the relative values of the sunspot numbers; the 
yearly averaged values f or the years 1700 to 1987 are 
presented here. 

The data are arranged column-wise, starting from the 
value for the year 1700. 


Table 8A.1 Yearly averaged sunspot numbers 


5 

73 

47.6 

132.0 

41.1 

61 . 

5 

17.0 

42 . 

0 

5 . 

7 

K 


11 

47 

54.0 

130.9 

30.1 

98 . 

5 

11.3 

63 . 

5 

8 . 

7 



16 

35 

62.9 

118. 1 

23.0 

124 . 

7 

12.4 

53 . 

8 

36 . 

1 



23 

1 1 

85.9 

89.9 

15.6 

96 . 

3 

3.4 

62 . 

0 

79 . 

7 

BE 


36 

5 

61.2 

66.6 

6.6 

66 . 

6 

6.0 

48 . 

3 

114 . 

7 

B 


58 

16 

45.1 

60.0 

4.0 

64 . 

5 

32.3 

43 . 

9 

109 . 

6 

93 . 

8 

29 

34 

36.4 

46.9 

1.8 

54 . 

1 

54.3 

18 . 

6 

88 . 

8 

105 . 

9 

20 

70 

20.9 

41.0 

8.5 

39 . 

0 

59.7 

5 . 

7 

67 . 

8 

105 . 

5 

10 

81 

11.4 

21.3 

16.6 

20 . 

6 

63.7 

3 . 

6 

47 . 

5 

104 . 

5 

8 

111 

37.8 

16.0 

36.3 

6 . 

7 

3.5 

1 . 

4 

30 . 

6 

66 . 

6 

3 

101 

69.8 

6.4 

49.6 

4 . 

3 

52.2 

9 . 

6 

16 . 

3 

68 . 

9 

0 

73 

106. 1 

4. 1 

64.2 

22 . 

7 

25.4 

47 . 

4 

9 . 

6 

38 . 

0 

0 

40 

100.8 

6.8 

67.0 

54 . 

8 

13. 1 

57 . 

1 

33 . 

2 

34 

5 

2 

20 

81.6 

14.5 

70.9 

93 . 

8 

6.8 

103 . 

9 

92 . 

6 

15 . 

5 

11 

16 

66.5 

34.0 

47.8 

95 

8 

6.3 

80 . 

6 

151 . 

6 

12 

55 

27 

5 

34.8 

45.0 

27.5 

77 . 

2 

7.1 

63 . 

6 

136 . 

3 

27 . 

48 

47 

1 1 

30.6 

43. 1 

8.5 

59 . 

1 

35.6 

37 . 

6 

134 . 

7 

92 . 

66 

63 

22 

7.0 

47.5 

13.2 

44 

0 

73.0 

26 

1 

83 . 

9 

155 

28 

60 

40 

19.8 

42.2 

56.9 

47 

0 

85. 1 

14 . 

2 

69 . 

4 

154 

65 


60 

92.5 

28. 1 

121.5 

30 

5 

78.0 

5 

8 

31 . 

5 

140 

38 


809 

154.4 

10. 1 

138.3 

16 . 

3 

64.0 

16 . 

7 

13 . 

9 

116 

29 


834 

125.9 

8. 1 

103.2 

m 


41.8 

44 . 

3 

4 . 

4 

66 . 

63 


477 

84.8 

2.5 

85.7 

Bi 


26.2 

63 

9 

38 . 

0 

45 

85 

1 

478 

68. 1 

0.1 

64.6 

Efl 

B 

26.7 

69 

0 

141 . 

7 

17 

94 

21 

307 

38.5 

1.4 

36.7 

II 

1 
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Appendix 8B 


DATA ON VARIATIONS IN THE ROTATION RATE OF EARTH 

The variations in the rate of rotation of the Earth, are 
reproduced from [1]. The series has been of interest for the 
possible relationship with the sunspot numbers (Appendix 8A) 
and in general with the planetary system. Yearly data for 
the period 1820 to 1970 are presented here. The measurements 
are in 10 -5 th of seconds. 


Table 8B.1 Variations in the Earth’s rate of rotation 


1821 

-217 

1851 

55 

1881 

-30 

1911 

361 

1941 

141 

1822 

-177 

1852 

51 

1882 

-12 

1912 

328 

1942 

150 

1823 

-166 

1853 

40 

1883 

11 

1913 

296 

1943 

157 

1824 

-136 

1854 

30 

1884 

57 

1914 

282 

1944 

143 

1825 

-110 

1855 

14 

1885 

92 

1915 

269 

1945 

138 

1826 

-95 

1856 

1 

1886 

86 

1916 

256 

1946 

137 

1827 

-64 

1857 

1 

1887 

53 

1917 

225 

1947 

151 

1828 

-37 

1858 

-4 

1888 

26 

1918 

202 

1948 

151 

1829 

-14 

1859 

-13 

1889 

6 

1919 

193 

1949 

136 

1830 

-25 

1860 

-56 

1890 

-12 

1920 

205 

1950 

111 

1831 

-51 

1861 

-83 

1891 

-35 

1921 

201 

1951 

105 

1832 

-62 

1862 

-104 

1892 

-31 

1922 

178 

1952 

105 

1833 

-73 

1863 

-93 

1893 

0 

1923 

139 

1953 

110 

1834 

-88 

1864 

-88 

1894 

36 

1924 

130 

1954 

104 

1835 

-113 

1865 

-75 

1895 

54 

1925 

101 

1955 

92 

1836 

-120 

1866 

-80 

1896 

65 

1926 

67 

1956 

96 

1837 

-83 

1867 

-101 

1897 

104 

1927 

22 

1957 

115 

1838 

-33 

1868 

-156 

1898 

166 

1928 

2 

1958 

144 

1839 

-19 

1869 

-226 

1899 

248 

1929 

12 

1959 

126 

1840 

21 

1870 

-293 

1900 

318 

1930 

26 

1960 

131 

1841 

17 

1871 

-333 

1901 

384 

1931 

21 

1961 

112 

1842 

44 

1872 

-347 

1902 

415 

1932 

10 

1962 

119 

1843 

44 

1873 

-329 

1903 

421 

1933 

-1 1 

1963 

139 

1844 

78 

1874 

-279 

1904 

402 

1934 

-12 

1964 

183 

1845 

88 

1875 

-205 

1905 

392 

1935 

-15 

1965 

206 

1846 

122 

1876 

-131 

1906 

387 

1936 

6 

1966 

231 

1847 

126 

1877 

-86 

1907 

391 

1937 

22 

1967 

244 

1848 

114 

1878 

-59 

1908 

396 

1938 

51 

1968 

239 

1849 

85 

1879 

-48 

1909 

400 

1939 

78 

1969 

263 

1850 

64 

1880 

-35 

1910 

391 

1940 

111 

1970 

273 


Reference 

[1] Luo Shi-fang, L. Shi-guang, Y. Shu-hua, Y. Shao-zhong, 
and L. Yuan-xi (1977): ‘Analysis of periodicity in the 
irregular rotation of the earth’, Chinese Astronomy, 1, 
221-227. 
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Appendix 9 


DATA ON COD PROCESS IN THE OSAKA BAY 


The chemical oxygen demand (COD) can be considered to be an 
index of water pollution in the sea. COD concentration is 
monitored at a number of stations in the Osaka bay along 
with water temperature, transparency and dissolved oxygen 
concentration. Altogether 84 sets of monthly data are 
available, corresponding to the years 1976 to 1983. 

The output variable is the COD concentration (y), which 
is related to the input variables: water temperature (Xj), 
water transparency (x 2 ) and dissolved oxygen concentration 
(x 3 ). y f are the observed values of filtered COD (i.e. COD 
values of sea water free from suspended materials). 


Table 9A.1 COD process data 


Year: 

month 

x i 

X 

‘ 2 

X 

: 3 

y r 

y 

Year: 

month 

x i 

X 2 

X 

3 

yr 

y 

1976: 









1978: 









04 

14.9 

2. 

0 

9. 

3 

3 


3.8 

01 

10 


3 


8. 

6 

1.1 

3. 1 

05 

16.6 

1 . 

7 

5. 

8 

3. 

3 

3.7 

02 

8. 

7 

1. 

9 

8. 

6 

1.8 

3.7 

06 

21.3 

2. 

1 

9. 

1 

2. 

2 

3 

03 

7. 

5 

1 . 

5 

9. 

8 

1.6 

2.9 

07 

24.3 

2. 

9 

7 


3. 

4 

4.7 

04 

12. 

5 

1 . 

7 

10 

2.2 

4.7 

08 

26.6 

1. 

7 

4. 

8 

2. 

9 

4. 1 

05 

17. 

7 

1 . 

1 

11 


1.5 

5.4 

09 

23.2 

1. 

3 

4. 

7 

1. 

6 

3. 1 

06 

21 . 

4 

1 . 

9 

8. 

6 

2.8 

5.5 

10 

22.2 

2. 

5 

4. 

5 

1. 

3 

3.1 

07 

26. 

3 

2. 

3 

4. 

8 

2.5 

3.7 

1 1 

18. 1 

2. 

5 

5. 

9 

2 


2.5 

08 

28. 

7 

1 . 

1 

8. 

2 

3.8 

8. 1 

12 

13.7 

2. 

8 

7. 

9 

3. 

3 

3.3 

09 

26. 

.7 

2. 

5 

8. 

6 

2 

4. 1 

1977: 









10 

23. 

2 

2. 

5 

5. 

4 

1.6 

2.9 

01 

7.7 

2. 

6 

9. 

3 

0. 

8 

2.3 

1 1 

19. 

, 1 

3 


5. 

1 

1.7 

3. 1 

02 

6.9 

2. 

3 

10.3 

1. 

5 

3.3 

12 

14. 

2 

3 


7. 

1 

2. 1 

2.3 

03 

7.4 

1. 

9 

9. 

4 

3. 

7 

3.8 

1979: 









04 

11.3 

2 


8. 

4 

2. 

2 

3 

01 

12. 

5 

2. 

5 

7. 

4 

1.8 

2.9 

05 

17.6 

2 


8. 

2 

2. 

4 

2.7 

02 

9. 

. 3 

2. 

4 

9. 

4 

2.2 

2.8 

06 

19.5 

2. 

4 

6. 

5 

1 


3.3 

03 

9. 

, 5 

2. 

5 

9. 

4 

2. 1 

2.7 

07 

21.6 

2. 

5 

5. 

1 

1 . 

3 

4.9 

04 

12. 

, 5 

2. 

5 

9. 

1 

2. 1 

4.3 

08 

26.5 

1 . 

9 

6. 

3 

1. 

8 

4.8 

05 

16. 

. 0 

2. 

, 1 

10 

i.O 

i 2.9 

4.9 

09 

26.4 

2. 

2 

8. 

1 

2. 

8 

4.4 

06 

19. 

, 7 

1 . 

6 

10 

.0 

3.0 

5.9 

10 

23.1 

3. 

8 

4. 

9 

2. 

2 

2.2 

07 

22. 

. 8 

3. 

0 

5. 

0 

2.0 

3.2 

1 1 

21 . 1 

2. 

5 

5. 

1 

0. 

8 

1.8 

08 

28. 

, 2 

1 . 

0 

10 

. 0 

i 3.3 

8.4 

12 

15.2 

2. 

5 

6. 

5 

1 . 

2 

2.9 

09 

24. 

, 6 

2. 

. 8 

6. 

4 

2. 1 

4.3 


(Contd. ) 
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470 Appendix 9 Data on COD Process In the Osaka Bay 


Table 9A.1 

COD 

process 

data (contd.) 






Year : 

month 

Xl 

*2 

X 3 

Yr 

y 

Year : 

month 

*1 

*2 

*3 

y f 

y 

1979 : 






1 982 : 






10 

22.2 

2.3 

5 . 9 

2.6 

3.6 

01 

8.5 

2 . 5 

8 . 4 

1.8 

3.4 

1 1 

17.9 

2 . 5 

6.7 

2.0 

3.0 

02 

8.1 

4.0 

8.5 

1.3 

1.7 

12 

14.3 

4 

7.5 

1.5 

3 . 1 

03 

8.2 

2.6 

8.7 

1.3 

2 . 1 

1980 : 






04 

14.8 

2.0 

10 

2 . 1 

5.3 

01 

7.3 

2.0 

9.4 

2.5 

3.4 

05 

17.5 

1.5 

11 

3.2 

6.8 

02 

6.6 

2.0 

11 

3.0 

4.6 

06 

19.3 

2.7 

6.5 

2.6 

4.2 

03 

8.8 

2.3 

r * 

00 

3.0 

3 . 8 

07 

23 . 1 

2 . 1 

6.2 

3.2 

6.3 

04 

13.6 

2 . 7 

10 

2.7 

3 . 6 

08 

26.1 

1 . 2 

9.4 

3.9 

7.2 

05 

13.3 

2.5 

7.6 

5 . 1 

5.6 

09 

23. 1 

2.5 

3.7 

2.8 

3 . 1 

06 

18.6 

1.9 

6.8 

3.9 

5 . 1 

10 

21.7 

2.3 

6 . 1 

1.8 

3.0 

07 

23.6 

2.0 

9 . 1 

3.0 

4.2 

1 1 

19.5 

2 . 5 

5.5 

1.8 

2.9 

08 

26.9 

2 . 1 

7.2 

4.6 

8.2 

12 

14.3 

3.6 

7.0 

2.0 

2.3 

09 

25.0 

3.0 

4 . 2 

2.7 

3.5 

1 983 : 






10 

22 . 1 

3.0 

5.8 

3.6 

6.9 

01 

8.6 

3.3 

8.4 

1.9 

2.3 

1 1 

18.2 

5.0 

6.7 

3.7 

3 . 8 

02 

8.5 

3.5 

10 

2 . 1 

2.5 

12 

1981 : 

16.6 

3.5 

6.0 

2.0 

4.0 

03 

8.3 

3.0 

9 . 1 

1.7 

1.9 

01 

5 . 1 

2.7 

10 

3.5 

3.8 







02 

6.7 

2.0 

10 

2.9 

3.9 







03 

7.5 

2.5 

9.0 

2.6 

2.8 







04 

13 . 1 

2.7 

8.8 

1.3 

1.9 







05 

14.7 

1.8 

5 . 1 

2.3 

3.2 







06 

17 . 1 

1.5 

3.7 

1.7 

2.7 







07 

19.8 

1.0 

1 . 8 

1.9 

2 . 1 







08 

23.9 

1.0 

1.5 

0.8 

2.0 







09 

25.8 

2.0 

4 . 6 

1.4 

2.0 







10 

23.0 

2.0 

6.3 

1.2 

2.5 







1 1 

19.6 

2 . 8 

6.7 

1.2 

1.9 







12 

15.2 

2.5 

7 . 1 

1.0 

1.6 








Ref erences 

Remark : Data from 1976 to 1981 appear in [1]. The rest are 
obtained from [2]. 

[1] Shin-Ichi, Fujita and Hiroshi, Koi, (1984): ‘Applica- 
tion of GMDH to Environmental System Modelling and 
Management’ in Self-organizing Methods in Modelling, 
Ed. S.J. Farlow, Marcel Dekker, New York. 

[2] Shin-Ichi, Fujita (1991): Private communication. 







Appendix 10 


GENERALIZED DELTA RULE 


The derivation of the generalized delta rule (GDR) which is 
due to Rumelhart, Hinton and Williams (1986) is presented 
here. GDR gives an expression for the adaptive change in the 
weights on the interconnection between the nodes (see 
Fig. 10. 2. 2) minimizing the cost 

h - \Z IE*/. < 10A - la > 

with the error, E z j, defined as 

E z j = (y r O zJ ) (lOA.lb) 

where yj and O zJ are the desired output and the computed 
output respectively of the jth node of the output layer Z 
which has N number of columns. 

Define Oqj and Sqj as the nodal output and the sum of 
all weighted inputs respectively at the jth node of the Qth 
layer (Fig.lOA.l). The inputs are 0 P1 , and W tJ is the 
weight on the interconnection between i-th node of one layer 
and j-th node of next layer. Assume the layers from the 
output end are Z, Y, X etc., having N, M, L, etc. number of 
nodes respectively. Fqj is the threshold level. 



Figure 10A.1 Topology of a generic node Qj 


GDR is based on the gradient descent algorithm, according to 
which the adaptation in the weights Wjj is proportional to 
the gradient error; for example for the jth node of Z layer 


AWj j a 


3J Z 

aw!7 


5J zl 30 zJ 3S zJ 

aoTi asTi a^Ti 


(10A.2) 
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472 Appendix 10 Generalized Delta Rule 


S z j = ^WjjOy! + F z j 
Hence 

as zj 

"giff — = Oyj. (10A.3) 

Again 

0*j “ f zj(^ z j), 

and assuming /(.) to represent sigmoidal nonlinearity: 

fix) = — - — , 
l+e” x 

ao zi _ a/ zj ( s 2j ) 
as 7} “ as^j 

= e'^J 

(1 + e"*^) 2 

= O zJ (l-O zJ ). 

Again, following (10A.1), 

3J Z 

50^ = _(y J ” °*J ,# 

Hence using (10A.2), (10A.3) and (10A.4), 

AWu = a(y j~O z j )O z j ( l-O z j )O y j , 

where a (0<a<l) is the proportionality constant. Thus 

AW t j = aO Y1 D z j, 
where 

D zj = (7j“ O z j)Ozj(l~O z j) 

is the discrepancy or error corresponding to the node Zj. 
Hence the error at the output of each node of layer Y due to 
the error D z j for j = 1 to n: 

c yi =£ WijD zJ . 

So error at output of node Y lt 

M 

J Y = L e Yi°Yl- 
i=l 


(1 + e"**J) 


1 - 


(1 + e" 8z J) 


(10A.4) 


(10A.5) 
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Hence 


8J y 



Now it is intended to compute changes in the weights V M on 
the interconnections between X layer nodes and Y layer 
nodes. 

d J y d Jy dOyj dSy| 

AVhi “ = dO^i av^ 

Using (10A.5), (10A.3) and (10A.4), 

AV hl « 3[J i W, J D 2 j]o yi (l - 0 Y1 )0 Xh 
= 0D Y1 O Xh . 

0 is the proportionality constant, and 

Dyi = ( r t W 1J D zJ ]0y i (l - 0 Y1 ) 

is the discrepancy or error corresponding to node Y t . 

If the nodes have threshold level inputs, the change in 
the threshold levels for the Zj nodes is given by 

AF zJ = aD z j 

Similarly, for the Yj node 
AFyj = pDy l . 

The procedure continues to earlier layers, if any. 

Remark: A modified version of the algorithm, referred to as 
recurrent backpropagation algorithm (Pineda, 1987) cam be 
used for the training of recurrent networks. 
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SVR SPECTRUM 

SVR spectrum is a method of determining the period length of 
periodic components present, if any, in any signal or data 
sequence; the periodic components need not be sinusoidal. 
The data <y(k)> are arranged into the consecutive rows of a 
matrix (A) which is singular value decomposed; the generic 
term Singular Value Ratio (SVR) spectrum stands for the 
spectrum of a function (usually squared ratio) of the most 
dominant and other singular values against varying row 
lengths of the data matrix. 

The singular value decomposition (SVD) of an mxn matrix 
A is given by 

A = USV T = j^UiSjVi, p = min(m,n), 

i=i 

u t and Vj being the column vectors of U and V respectively; 
the singular values (s t ) of A appear in nonincreasing order 
down the diagonal of S (see Sec.7.6). If the data series 
<y(k)), which is contained in the consecutive rows of A, is 
perfectly periodic with period length n, the rows will be 
aligned with respect to each other, and A will be a rank-one 
matrix. Hence the first singular value s x will be large, 
whereas the other singular values will be zero. However if 
the row length of A is different from n, there will be 
misalignment between the rows and A will no longer be a 
rank-one matrix resulting in relatively lower value of s a 
with respect to the other singular values. 

Hence the ratio, 

p(n) = Sj/sg, (11A.1) 

may be used as an index indicating the presence of a 
periodic component of period length n in a data sequence 
(y(k)>, where Sj and s 2 are the first two singular values of 

the matrix A of row length n, which contains the data. The 

distribution of p(n) against n is termed as the SVR 

spectrum. It has two prime features: 

(i) The spectrum will show a peak at the period length (N) 

of the strongest periodic component present in the data 

sequence; 

(ii) the spectrum will also show peaks at multiples of this 
period length (i.e. at N, 2N,...). 
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Remarks 

(1) In the case of a dynamic series, the concept of an 
(overlapping) moving window may be considered (see Appendix 
7A.2) for the formation of the data matrix A(K), where K is 
the window index. For each row length, the ratio s^/Sg for 
each A(K) is computed and the averaged (or median, i.e. 
centre of the serially arranged data set in order of 
magnitude) value is considerd as the p(n) for constructing 
the SVR spectrum. 

(2) In place of p(n), an alternative expression in terms of 
the relative energy in the first decomposition component may 
be used which is given by 

p E (n) = (si/ |>i) . 

i=i 

where 

D*?= E E a ij» «a u » = A, 

1=1 i=ij=i 
A being an mxn matrix. 

Summary for computation of SVR spectrum 

(1) The data series is arranged into an mxn matrix A. 

(2) A is SV-decomposed. 

(3) p(n) = Sj/s 2 is computed. 

(4) n is incremented to n+1, and steps 1 to 3 are repeated 
until m = 2 or n is too large to be of interest. 

(5) The plot of p(n) against n is the desired SVR spectrum. 

Features 

(1) If the discrete-time signal or data sequence is a 
sampled version of a continuous-time phenomenon, the data 
are more closely sampled and the detection of the period 
length by the SVR spectrum is more accurate. 

(2) SVR spectrum requires only the singular values of A; 
the singular vector matrices U and V need not be computed. 

(3) It is observed that high values of p(n) are obtained 
for low values of n, where the series is almost linear. This 
initial monotonically decreasing part of the SVR spectrum 
should be ignored. 

(4) In the case of noninteger period length of a periodic 
component, the peaks tend to drift at multiples of the 
length at which the first peak occurs. In such cases, the 
series may be expanded using equispaced interpolated data; 
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SVR spectrum on expanded series is expected to produce 
closer detection of the period length of interest. 

Examples 

Three examples are presented as follows: 

(a) The ozone column series (Appendix 7B): This is a monthly 
data series with yearly periodicity. The SVR spectrum 
(Fig.llA.l) shows repeating peaks at multiples of the period 
length 12, confirming the presence of a component of period 
length 12. 

(b) A white Gaussian noise process: The SVR spectrum of 
this series (Fig.llA.2) shows absence of any dominent 
periodic component. 

(c) A chaotic series (x(k)> generated by the Mackey -Glass 
equation: 

x(k+l) - x(k) = ax(k ~ T) /3x(k), 

1 + x r (k-x) 

with a = 0.2, 3 = 0.1, y = 10 and x = 30. This series is 
discussed in Sec. 8. 3. 2. The SVR spectrum in Fig. 11 A. 3 
confirms the absence of any period component. 

The use of SVR spectrum for decomposition of periodic 
components is discussed in Sec. 11. 5; the problem of the 
extraction of a signal from the composite signal using SVR 
spectrum is addressed in Sec. 14. 5. 4. 
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The identification of the period length is based on 
closeness of rank-oneness; any feature of the series that 
influences the rank-oneness or near orthogonality of a 
component with respect to the rest, will affect the 
detection of period length. Although SVD is used in the 
present study, other orthogonal decompositions like the QR 
decomposition may be used for similar application. 

SVR spectrum is a recently developed concept (Kanjilal 
and Palit, 1995); further research is expected to detail its 
prospects and limitations. 



Figure 11A.3 SVR spectrum of the Mackey-Glass equation 
modelling a chaotic process. 
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Appendix 12A 


SYSTEMS AND CONTROLS BASICS 

Some basic concepts related to systems, models and control 
methods are introduced here. 

Every system or process is, by itself, a continuous 
time process. However, for considerations related to 
measurement and computation, process measurements are often 
recorded at discrete-time intervals, and the monitored 
process (Fig.l2A.l) is loosely called a discrete-time 
process. 

Consider a discrete-time model 

y(k) + a^ik-l) + a 2 y(k-2) = b 0 u(k-d) + bjuik-d-l) + b 2 u(k-d-2), 

where u is the input to the process and y is its output; 
the output responds to a change in control action after d 
sampling intervals. A concise expression using discrete-time 
polynomial operators follows as 

A(q _1 )y(k) = B(q _1 )y(k-d), 
where 

. , -i. , -1 -2 

A(q ) = 1 + a^ + a 2 q , 

B(q _1 ) = b 0 + b^” 1 + b 2 q’ 2 . 

The process is also expressed as 

y(k) = ] q - u(k) = G(q -1 )u(k), 

A(q ) 

where G(q 1 ) is the transfer function of the process. Two 


input u(t) 


Process 


u(k) 


Output y( t) 



y(k) 


Figure 12A.1 Schematic of an open-loop process with u 
as input and y as the output; the process measurements 
are recorded at discrete-time intervals. 
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basic properties of a system are the nonminimum-phasedness 
and stability. 

The process is called open-loop stable, if its poles 
(i.e. roots of A(q -1 )) lie inside the unit circle; the 

converse is also true. The term open-loop implies the 

inf ormation of the process output is not available to the 
mechanism that generates the input to the process. 

A discrete-time process is defined to be nonminimum- 
phase, if it has unstable zeros (i.e. roots of the 

numerator, B(q; lying outside the unit circle in the 

z-plane) (Franklin and Powell, 1980). Such a process is also 
said to have an unstable Inverse. Since a discrete-time 
process is basically obtained through the sampling of the 
continuous-time processes, the effect of sampling on the 
stability of the system deserves careful consideration. If a 
continuous-time process with a stable inverse is sampled too 
f ast, the discrete-time process so generated can have an 
unstable inverse. Conversely, even if the continuous-time 
process has an unstable inverse, it can be sampled at a slow 
rate and the resulting discrete-time process can have a 
stable inverse. Again too slow a sampling rate may fail to 
capture fully the process dynamics and thereby lack in 
representativeness. 

Example 12A Consider a discrete-time process with a transfer 
function 

b t q -1 + b 2 q 

G(q -1 ) = . (12A.1) 

1 + a t q + a 2 q 

Consider two different sets of parameter for this process: 

(a) a t = -1.4, a 2 = 0.6, b t = 0.6, b 2 = -0.4. 

Here, the poles are at 0.7+j0.6633 and 0.7-j0.6633 and the 
zeros are at 0 and 0.66 (the unity time delay represents the 
zero at 0) in the z-plane. 

(b) a x = -1.4, a 2 = 0.6, = 0.4, b 2 = -0.6. 

Since the numerator has a root at 1.5, the process is a 
nonminimum-phase process in this case. 

The response of the process (12A.1) to a unit step 
input (at time k * 10) for the two different parameter sets 
mentioned above is shown in Fig.l2A.2. The process has 
unity steady-state gain (obtained for q=l in (12A.1)). □□ 
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y(k) 



Figure 12A.2 Response of the system shown in Fig.l2A.l 
to unit step input: 

with parameter set (12A.2(a)), and 

with parameter set (12A.2(b)); the 

negative going output in the latter case is due to the 
nonminimum-phase nature. 


A system or process with input u and output y is said 
to be causal if the output at any time is dependent on 
inputs up to that time; the output is also called the 
controlled variable. Causality is not a symmetric property, 
that is y can be causal output to input u but not 
vice-versa. 

A system is said to be time Invariant if the response 
of the system, i.e. the relationship between the output and 
the input does not vary with time. Again, a system is said 
to be linear if its inputs and outputs satisfy the principle 
of superposition; in other words, the output response of a 
linear time-invariant system due to a number of inputs is 
equal to the summation of the output responses corresponding 
to the individual inputs. For example, if the outputs y x and 
y 2 are obtained for the inputs u t and u 2 respectively to the 
systems the output response due to the input (u 1 +u 2 ) will be 
(yi+y 2 >- 

The system, shown in Fig.l2A.3, is said to be a closed- 
loop system, where the control input at any point in time is 
determined in consideration of the system response y up to 
that time. The controller is fed with the deviation of the 
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Desired 
set point 



feedback signal 


Figure 12A.3 Schematic diagram of a closed-loop system. 


system response (used as the feedback signal) from the 

desired set point. In other words at every time k, the 
measurement of the system output is fed back to the 
controller (through some intermediate stages). To be 
precise, a control system is ref erred to as a feedback 
control system, if it utilizes the observations or 

measurements that are fed back but ignores the subsequent 
feedback information; a control system is said to be a 
closed-loop control system, if in addition to utilizing the 
feedback information on the process output, it is understood 
that the subsequent feedback information as well as the 
associated statistics will be available, i.e. the loop will 
stay closed. 

The basic objective of process control is to make the 
process perform so that its output remains at' the desired 
level. The type of control depends mainly on the nature of 
process dynamics. If the process characteristics do not vary 
with time, the process is called a deterministic process; in 
such cases the controller parameters will need to be tuned 
only once, or very infrequently. 

It is necessary to resort to adaptive control when 
process dynamics vary with time. The two main units within 
an adaptive controller (Fig.l2A.4) are the process (or 
controller) parameter estimator and the controller itself. 
Different designs are possible for the adaptive controller. 
If the change in the process dynamics is predictable from 
the available measurements or other information, gain 
scheduling cam provide satisfactory control. In gain 
scheduling, the process parameters need not be estimated; 
here the controller parameters are in the form of a look-up 
table for various ranges of measurements over various modes 
of operations. The updating mechanism for the table of 
parameters based on controller perf ormamce may or may not be 
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Adaptive Controller 


Figure 12A.4 Schematic diagram of a typical adaptive 
control system. 

there. If the dynamics of the process vary so unpredictably 
that the controller parameters cannot be precalculated from 
the available measurements, the process (or controller) 
parameters are estimated and are used in the design of the 
controller. The self -tuning control, discussed in Sec. 12. 2, 
belongs to this category. 

The controller should be optimal. The word optimal, by 
itself, is not very meaningful; a cost function has to be 
specified which the control algorithm has to minimize with 
respect to a process model. The quality of control perfor- 
mance will depend on the rationale of the cost function. In 
real life, optimal performance is difficult to achieve 
because real-life problems conform neither to the optimality 
criteria nor to the mathematical model of the process. 
However optimal control provides a guideline f or the design 
of the controller and the idealized performance. 

Most real-life processes are not deterministic and 
hence not exactly known. There is a factor of uncertainty in 
the mathematical model of the process. Stochastic control 
concerns control of processes in the presence of uncerta- 
inty. The control performance is not expected to be optimal. 
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and hence the control is referred to as suboptimal control. 
Suboptimal is a mathematically undefined term, which is 
expected to mean ‘close to optimal*. Adaptive control 
(Fig.l2A.4), is a suboptimal control which incorporates 

(a) some method of estimation of the process parameters, and 

(b) a suitable control strategy to produce the control law 
in consideration of the latest process parameters. 

One of the desirable properties of a controller is 
robustness. A robust control system is one that continues to 
perform in accordance with the design criteria, irrespective 
of the change in the system dynamics and hence in the system 
behaviour, from its mathematical model. 

Ref erences 
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SMITH PREDICTOR 

One of the f undamental works in predictive control is 
by Smith (1957), who designed a controller essentially free 
from the effects of the time delay. 

A typical process shows am inherent time delay between 
the input u, amd the process output y (Fig.l2B.l), Let the 
time delay be expressed as G d , given by exp(-sr), in 
continuous time, s being the Laplace operator. The control 
action can at best force the output to be equad to the set 
point in a time equal to the dead time of the process. 
Faster control performance is not possible. Under-estimation 
of the time delay and unnecessary control action can lead to 
instability. 



Figure 12B.1 A typical time-delay process. 


To take account of the time delay of the process, 
instead of operating on the output error, (w(t)-y(t)), the 
controller should operate on the predicted output error, 
(w(t)-y(t+r)). The Smith predictor offers a novel approach 
for the realization of the output predictor, which with 
appropriate design of the controller results in a control 
performance free from the effects of time delay. This can be 
achieved by adding a negative feedback loop around the 
controller with G^U-G^) as the feedback element, where 
Gp,,, and G dm are the modelled process transfer function (G p ) 
and modelled time delay (G d ) respectively. The overall 
closed-loop realization is shown in Fig.l2B.2. Using Laplace 
transforms, the effective controller C of Fig.l2B.l, in the 
new configuration becomes C*, where 
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dl sturbance 



Figure 12B.2 The Smith predictor control scheme. 


C*(s) = , (12B.1) 

1 + C(s iGpjji s ) (1 - exp(-sr m )) 

and the overall closed-loop system (Fig.l2B.2) is given by 


y. _ 

w 


CG p G d 


1 + CGn- - CG Dn G dB + 


r~n n 


(12B.2) 


If Gp,,, = G p and G^ = G d> and if the disturbance is zero, 
(12B.2) reduces to 


y. 

w 


CG p G d 
1 + CG p • 


(12B.3) 


Note that (i) the closed-loop characteristic equation 
(12B.3) becomes free from the time-delay term, and (ii) y* 
in Fig.l2B.2 is effectively the prediction y(t+r), if Gpm 
and G dm are correctly modelled. 

Some of the drawbacks of the Smith predictor scheme are 
as follows. 

(a) A delay-free process model and a delay model are 
required. The scheme will not be able to stabilize open-loop 
unstable processes. 

(b) The control performance is dependent on the modelling 
accuracy of Gp,,, and G^. The control performance is expected 
to be more sensitive to the drift in system parameters than 
its delay-free counterpart. 

(c) The implementation of the modelled time delay can be 
difficult. 
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disturbance/ 

noise 



Figure 12B.3 The discrete-time Smith predictor control 
scheme. 


If the discrete form of operation is permissible, most of 
the above drawbacks disappear. A discrete-time realization 
of the Smith predictor is shown in Fig.l2B.3. 

Here, A(q ) and B(q J are polynomials in discrete 
time; q is the unit backward shift operator, and d is the 
time delay in discrete sampling time intervals. The 
subscript m stands for modelled (or estimated) quantities. 

Under certain conditions the Smith predictor control 
scheme can be shown to be equivalent to the self -tuning 
controller as shown in Gawthrop (1977). Here again y* can be 
interpreted as the c^-step ahead prediction y(k+d m |k), where 
d,„ is the modelled time delay. The drifts in parameter 
variation can be easily taken care of by on-line parameter 
estimation. 

For further discussions on the Smith predictor, see 
Marshall (1979). 
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DERIVATION OF STATE-SPACE DETERMINISTIC LQ CONTROL 

In this appendix the state-space formulation of the 
deterministic LQ control problem is presented. A multi-input 
multi-output process is considered. The derivation is based 
on dynamic programming, which was developed by Bellman in 
1953 (Bellman and Dreyfus, 1962). The derivation can be 
easily simplified to the single-input single-output case. 

Consider the deterministic model 

x(k+l) = Ax(k) + Bu(k), (13A.1) 

y(k) = Cx(k), (13A.2) 

where x is an nxl state vector, y is rxl vector of measured 
outputs, u is the mxl vector of deterministic control 
inputs; A, B, and C are nxn, nxm, and rxn real matrices 
respectively which are assumed to be known. The initial 
state x(0) is known. 

The objective is to produce the optimal control 
decisions u(k), u(k+l),..., u(k+N-l), so as to minimize the 
scalar cost function 

J = x T (k+N)Q„x(k+N) + k f; N ci T (i)Q H _ 1 x(i) + u T (i)Au(i)). 

1 =k 

(13A.3) 

where Qn and Q n _j are symmetric positive semidefinite 
matrices and A is a positive definite matrix. 

Let the cost at the last stage of the horizon (k,k+N), 
be defined as 

J„ = x T (k+N)P(k+N)x(k+N), (13A.4) 

where P(k+N) = Q M . Similarly the total cost over the last 
two stages (k+N-1 and k+N) is given by 

J N _i = min (x T (k+N-l)Q M _ 1 x(k+N-l) + u T (k+N-l)Au(k+N-l) + J H >. 

u (k+N-1) 

From (13A.1), (13A.5) 

x(k+N) = Ax(k+N-1) + Bu(k+N-1). (13A.6) 

Substituting for J N in (13A.5) using (13A.4) and (13A.6) 

J H _, - min (x T (k+N-l)Q M . 1 x(k+N-l) + u T (k+N-l)Au(k+N-l) 

u ( k+N-1 ) 

+ (Ax(k+N-1) + Bu(k+N-l)) T Qjj(Ax(k+N-l) + Bu(k+N-1))). 

(13A.7) 
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The minimization of J N _ X with respect to u(k+N-l) implies 

3J„_i 

3u(k+N-l) = °’ 

That is 

u T (k+N-l)A + (Ax(k+N-1) + Bu(k+N-l)) T P(k+N)B = 0. (13A.8) 


Since A and P(k+N) are symmetric, the optimal control input 
at stage (k+N-1) is given by 

u(k+N-l) = - [A + B T P(k+N)B] _1 B T P(k+N)Ax(k+N-l) 


= - K{k+N-l)x(k+N-l), (13A.9) 

where 

K( k+N-1) = [A + B T P(k+N)B] _1 B T P(k+N)A. (13A.10) 

Substituting for u(k+N-l) in (13A.7) from (13A.9) for 
minimum J N _ X , 

j m-i = x T ( k+N-1) [Q N _ X + K T (k+N-l )AK(k+N-l ) 

+ (A - BK(k+N-l)) T P(k+N)(A - BK(k+ N-l))]x(k+N-l). 
Define the term within the bracket as P(k+N-1): vliA.^J 

P(k+N-1) = Q„_ x + K T ( k+N-1 ) AK( k+N-1 ) 

+ (A - BK(k+N-l)) T P(k+N)(A - BK(k+N-l)). 

Using (13A.9), 

P(k+N-1) = Q N _ X + K T { k+N-1 ) ( A+B T PB)K + A T P(k+N)A 

- K T (k+N-l)B T P(k+N)A - A T P(k+N)BK(k+N-l) 


= Q m _ x + A T P(k+N)A - A T P(k+N)B[A + B T P(k+N)B]'Vp(N)A. 

So, J H _ X in (13A.11) can now be expressed as (13A.12) 

J N _ X = x T ( k+N-1 )P( k+N-1 )x( k+N-1 ) . (13A.13) 


Note that equation (13A.13) is the ssune as (13A.4) except 
for the time index which is reduced by 1 in (13A.13). 


Similarly the total cost over the last three stages 
(k+N-2, k+N-1, k+N) is given by 

J„_ 2 = min [{x T (k+N-2)Q M . z x(k+N-2)+u T (k+N-2)Au(k+N-2)> 

u ( k+N-2) 


Since 


+ min <J N -i>]. (13A.14) 

u ( k+N-2) 


x(k+N-l) = Ax(k+N-2) + Bu(k+N-2), 
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using (13A.13), and substituting for J N _j in (13A.14), 

Ju_ 2 = min {x T (k+N-2)Q M _ 2 x(k+N-2) + u T (k+N-2)Au(k+N-2) 

u ( k+N-2) 

+(Ax(k+N-2)+Bu(k+N-2)) T P(k+N-l)(Ax(k+N-2)+Bu(k+N-2))>. 

(13A.15) 

Note that equation (13A.15) is same as (13.7) except for the 
time index which is reduced by 1 in (13A.15). So, as in 
(13A.8), 

aj N-2 

5u(k+N-2) = 0> 

That is 

u T (k+N-2)A + (Ax(k+N-2) + Bu(k+N-2)) T P(k+N-l)B = 0. 

Since A and P(k+N-1) are symmetric, the optimal control law 
at stage (k+N-2): 

u(k+N-2) = - [A+B T P(k+N-l)B] -1 B T P(k+N-l)Ax(k+N-2) 

= - K( k+N-2 )x( k+N-2 ) , (13A.16) 

where 

K( k+N-2) = [A+B T P(k+N-l)B] -1 B T P(k+N-l)A. (13A.17) 

Thus starting from the prespecified terminal condition 
P(k+N) = Q n , the optimal control law can be computed through 
backward recursions from one stage to the next, with each 
stage having identical structure, until the present stage is 
reached. At each stage k (i s k < k+N, i being the present 
time), the gain matrix K(k) is computed using P(k+1) and the 
control u(k) is determined; P(k) is next computed to be used 
in the next stage k-1. 

Summing up, the general solution of the deterministic 
optimal control problem is given by 

K(k) - [A + B T P(k+l)B] _1 B T P(k+l)A, (13A.18) 

P(k) = Q k + A T P(k+l)A - A T P(k+l)B[A+B T P(k+l)B]" 1 B T P(k+l)A, 

(13A.19) 

u(k) « - K(k)x(k), (13A.20) 

where P(k+N) = Q^. 

Ref erence 

[1] Bellman, R., and S.E. Dreyfus (1962): Applied Dynamic 
Programming, Princeton University Press, Princeton, N.Y. 
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TRANSMITTANCE MATRIX: 

FORMULATION AND IMPLEMENTATION 

13B.1 Introduction 

Use of transmittance matrices offers a straightforward 
method for simplifying a polynomial matrix multiplication 
problem into an ordinary matrix multiplication problem. The 
transmittance matrix formulation can be very useful in per- 
forming state estimation (Lam, 1982, Clarke et al, 1985). 
This appendix discusses the vector f ormulation of the 
transmittance matrix and its application f or state 
estimation; FORTRAN implementation is also presented. 

13B.2 Problem Statement 

The objective is to compute the transmittance matrix M g of 
g(k-l) using the following decomposition 

[I - q -1 F] -1 vg(k-l) = [ql - F] -1 vg(k) = Mgg(k-l)/C(q" X ), 

(13B.1) 

where the mxm matrix F and the m vector v are given by 


'-C! 1 0 ... 0‘ 

-c 2 0 1 ... 0 



C(q -1 ) = 1 + Cjq -1 + ... + c n q” n , and 
g(k-l) is the stacked vector of g(k), given by 
g(k-l) = [g(k-l) g(k-2) ... g(k-m)] T . 

Remarks 

(a) The transmittance matrix Mg is a symmetric matrix. 

(b) Mg in (13B.1) is a function of the elements of the 
vector v and the parameters of C(q _1 ). 

(c) The actual row (or column) size m of Mg = max (order of 
C(q -1 ), the size of v without tail-end zeros). In other 
words, for implementational purposes, M u and M y in (13.6.7) 
need not be of the same size. 
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13B.3 Solution of [ql - FJ 


[ql - F]" 1 = Adjoint [ql - F]/D(q), 
where mxm matrix 

q+Ci -1 

c 2 q -1 0 


[ql - F] = 


0 

0 


-1 

q 


and D(q), the determinant of [ql - F], is given by 
D(q) = q m + c^q" 1 ” 1 + ... + c n q m ' n 
= q m C(q -1 ). 


It is known (Cadzow and Martens, 1970) that 
Adjoint [ql - F] = Iq m_1 + Hjq 1 ” -2 + ... 


+ H»-i 


where 


= q m [Iq 1+ H iq ‘ 2 + H 2 q‘ 3 + ... + H^], 


Pj = trace(F), 


(13B.2) 


(13B.3) 


(13B.4) 


(13B.5) 


H t - F + Pl I, 

* F + c x l. 

H 2 = FH t + p 2 I. p 2 = 1/2 traceCFHj) = c 2 . 

H 3 = FH 2 + p 3 I. p 3 = 1/3 trace(FH 2 ) = c 3 . 

H,,, = FH^ + p m I. p m = 1/m tracefFHJ = c,,. 

In the present case p n+1 = p n+2 =...= p m = 0, where n = 
degree of C(q 1 ). Following (13B.2 - 13B.5), 


[ql - F]" 1 = [Iq 1 + H iq " Z +...+ l^.J/Clq' 1 ) 


(13B.6) 


where 


= M/Ciq 1 ), 


M = 


• -1 -2 -3 

q q q 

- 1 -2 

S 2 q r 2 q r 2 

-l 

qs 3 s 3 q r 3 


n-2 n-3 

q s n q s n 


-1 

q r n 


q’^n+l 


-m+1 

q r 2 

-m+2 

q r 3 


-m+n- 1 
q r n 

-m+n 

q r n+l 
-m+n+1 
q r n+l 

q'^n+i 


(13B.7a) 
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r t = 1 + ^q' 1 + ... + Cj.jq" 1 * 1 , for i = 2,3,.., n+1, 

r n+2 = r n+3 = ••• ~ r n+l» 

s t = rj - C(q _1 ), 

s n+1 = s n+2 = ... = 0. (13B.7b) 

The derivation of the transmittance matrix f rom M can be 
illustrated as follows. 


Example 13B.3 The state-space representation of the process 
y(k) + a^ik-l) + a2y(k-2) = b 0 u(k-l) + b 1 u(k-2) + b 2 u(k-3) 

+ e(k) + Cjeik-l) + c 2 e(k-2), 


is given by 


x(k+l) = 

-a 4 1 O' 

-a 2 0 1 

x(k) + 

bi 

u(k) + 

" c i -a i 

c 2 -a 2 


0 0 0 


- b 2. 


0 


e(k). 


y(k) = tl, 0 ... 0] T x(k) + e(k); 
find M u , where 

[ql - F] _1 bu(k) = M u u(k-1)/C(q 


), and 


(13B.8) 


F = [A - ec T ]. 


Here n = 2, k = 1, m = 3. 
[ql - F] = 


q +c i 

c 2 

o 


-l 

q 

o 


o 

-l 

q 


D(q) = q 3 (l + c t q 1 + c 2 q” 2 ) 

[ql - F]" 1 = [Iq' 1 + H iq " 2 + H^J/Ciq' 1 ). 


H x = F + p,I, 
H 2 — F + p 2 I, 
That is 


H t = 


Following (13B.6), 
-x 


P! = trace(F) = c 4 . 
p 2 = 1/2 traceiFHj) = c 2 . 


' 0 

1 

0 ' 


0 

0 

1 ' 

-C 2 

cx 

1 

, h 2 = 

0 

0 

Cl 

0 

0 

c i. 


0 

0 

c 2. 


[ql - F] -1 = M/C(q _1 ), where 


M = 


q 
-c 2 q 
0 


-2 


-2 

q 

-1 -2 

q + c iq 

o 


-3 

q 

-2 -3 

q +c iq 

q~ 1+ c 1 q" 2 +C2q" 3 


(13B.9) 


(13B.10) 
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Alternatively, using (13B.7) directly, 

-l 


r 2 = 1 + Cjq , 

-l 

s 2 = -C 2 C 1 » 

-1 -2 
q q 


M = 


s 2 
Lq s 3 s 3 


-1 

q r 2 


r 3 = 1 + C 1 q“ 1 + c 2 q“ 2 , 
s 3 = 0, 

q 3 1 
q’ Z r 2 
-1 , 
q r 3 J 


which is the same as (13B.10). 

Now let us consider a general expression for 

/■ \ i „ __ |T 


[ql - F] vg(k), 
Using (13B.10), 

[q!-F] -1 vg(k)= 1 


C(q -1 ) 


where 


-2 


v = [Vj v 2 v 3 ] . 

3 


-c 2 q’ Z q'^q" 2 q'V^ -3 

-1 -2 -3 

0 0 q +Cjq +c 2 q 


g(k) 


C(q -1 ) 


v 3 q 


-3n 


-1 . -2 
v t q + v 2 q 

v 2 q _1 + (ciV 2 -c 2 v 1 +v 3 )q 2 + c^q' 3 

v 3 q" 1 + Cj v 3 q ' Z + c 2 v 3 q 3 


g(k) 


Vi 


1 

C(q _1 ) 


▼2 

, v 3 


V2 

V3+C 2 V 1 -C 2 V2 

ClV 3 


Thus 

[ql - F] -1 vg(k) = Mgg(k-1)/C(q -1 ), 
where 

g(k-l) = [g(k-l) g(k-2) g(k-3)] , 



'v(k-l)' 


v(k-2) 


v(k-3) 


(13B.11) 


(13B.12) 


and 3x3 matrix Mg» defined as the transmittance matrix, is 
given by (13B.11). 

For the present case (13B.8), since 

b = [b 0 b t b 2 ] T and [ql - F] -1 bu(k) = M u u(k-1)/C(q _1 ), 
the transmittance matrix M u is given by 


M u 


b 0 b t b 2 

bi b 2 +c 1 b 1 -c 2 b 0 Cjb 2 , 
b 2 Cjb 2 c 2 b 2 _ 
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with 

u(k-l) = (u(k-l) u(k-2) u(k-3)] T . 

13B.4 Structure of the Transmittance Matrix 

As shown in the last example, the transformation 

[ql - F] _1 vg(k) = [I - q -1 F] -1 vg(k-l) = Mggik-D/Cfq' 1 ) 

converts a polynomial matrix multiplication problem into an 
ordinary matrix multiplication problem; 1/C(q ) is handled 
separately. The transmittance matrix Mg is of the same size 
as F. The transmittance matrix is symmetric and has a 
general structure as follows 



13B.5 Implementation 

In the transformation 

[ql - F ] _1 vg(k) « Mgg(k-1 )/C(q~ 1 ), 

the transmittance matrix M g is given by (13.B13), the para- 
meter vector v and the stacked vector g(k-l) are given by 

V = [v t V 2 ... v m ] T , 

g(k-l) = (g(k-l) g(k-2) ... g(k-m)] T . 

The transmittance matrix being symmetric, the diagonal and 
the upper triangular parts are stored in a vector TMO of 
size m(m+l)/2 as shown below. 
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13B.6 FORTRAN Mechanization 

The FORTRAN mechanization has two parts: 

(i) formation of the transmittance matrix, and 

(ii) multiplication of the transmittance matrix by a vector. 

Inputs V() parameter vector v 

C(),IC C vector of size IC containing parameters 
c t , Cg, • . . of C(q ) 

ITM Size of transmittance matrix vector TMO, 

= (>1=) maxi degree of C(q -1 ), size of V()> 

GO Stacked vector of process variable, post- 

multiplying TM, e.g., Au(k-l) in (13.6.34). 
PXNEWO Zero vector initially; 

after 1st pass, = TM()*G() and so on. 

Outputs TM() Linear array transmittance matrix elements 
PXNEWO updated by additional set of TM()*G(). 

Comment: TMO stores the diagonal and upper triangular part 
of the transmittance matrix. 

IC1 = IC + 1 

N = ITM +1 

MTOTAL = (ITM*N)/2 
DO 2 I =3, MTOTAL 
2 TMO) = 0.0 

K = (ITM+D/2 

L =0 

DO 4 I = 1,K 

L = L + I 

IW = L 

DO 4 J = I, N - I 

TM(IW) = VO+J-l) 

4 IW = IW + J 

Comment : Now Cj*v 4 elements are placed. Start from the last 
column of the 2nd row and proceed m-1 steps down the 
column; next start from last but one column of the 2nd row 
and proceed m-2 steps down the column etc. 

ITER = ITM/2 

M = MTOTAL + N 

DO 6 L =1, ITER 
M = M + L - N 

ILIM = N - L - L 

K = M 
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DO 6 I =1, ILIM 

K = K - ITM + I + L - 2 

NL = N - I 

DO 6 J =1, ILIM + 1-1 

IW = K + J 

IF (J.LT.IC1) TM(IW) = TM(IW) + C(J) * V(NL) 

IF (NL.LT.IC1) TM(IW) = TM(IW) - C(NL) * V(J) 

6 CONTINUE 

Comment: Now compute PXNEWO = (transmittance matrix) x 
(stacked vector) + earlier PXNEWO. For example consider 
computation of (13.6.12): 

C(q -1 )x(k|k-l) = [M^utk-l) + M*Aw(k) + M e e(k-1)]; 

(i) first PXNEW(I) is zeroed, (ii) call present subroutine 
to compute M„Au(k-l) which is added to PXNEWO, and 
(iii) repeat (ii) for M„Aw(k) and for M e e(k-1). M„, M w and 
M e need not be of the same size. 

L =0 

DO 9 I = 1, ITM 

DO 92 J = 1, I 

92 PXNEW(I) = PXNEW(I) + TM(L + J)*G(J) 

IW = L + I 

DO 94 K = 1, ITM - I 

IW = IW + I + K - 1 

94 PXNEW(I) = PXNEWO) + TM(IW) *G(I+K) 

L = L + I 
9 CONTINUE 
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COVARIANCE TIME UPDATE USING U-D FACTORIZATION 
13C.1 Introduction 

U-D covariance factorization is attractive because of 
numerical robustness, algorithmic simplicity and computa- 
tional efficiency. Bierman (1977, p.124) studies the general 
covariance time update problem: 

P = 0 P*0 T + GQG T , (13C.1) 

where U-D factors of P*, P* = U*D*U* T , are given and the 
updated U-D factors are computed. A particular form of the 
U-D covariance update problem relating to the LQ state-space 
controller (Sec.13.7) is discussed in Clarke et al (1985), 
where the vector implementation is also presented; the 
material presented here is largely based on this reference. 

13C.2 The Problem 

P » A T P*A + gqg T , P*= U*D*U* T (13C.2a) 

= UDU T , (13C.2b) 

T 

where A is in observable canonical form, and g -= [1 0...0] ; 
the objectives are 

(a) to compute updated factors, U and D, given prior cova- 
riance factors, U* and D*, where D and D* are diagonal and U 
and U* are unit upper triangular matrices, and 

(b) to obtain the vector implementation for the covariance 
U-D time update (13C.2). 

13C.3 Summary of Algorithm 

The factorization problem may be reformulated as follows. 
The updated nxn covariance matrix P is given by 

P(k) - [g A T U*3 diag(q D*> [g A T U*] T , 

= WDW T , i.e., W = [g A T U], D = diag[q D*], (13C.3) 

= UDU T , _ (13C.4) 

where the square matrix D is m diagonal, D is n diagonal 
and W is an nxm matrix, m>n. 

One of the efficient ways of U-D factorization in the 
present case is by using modified weighted Gram-Schmidt 
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(mWGS) orthogonalization procedure. The basic idea is to 
decompose W in (13C.3) such that 


Ti 


r Tn 

Wi 


V 1 

T 


T 

w 2 

- U 

T 2 

' T 


- T 

L W nJ 


L v nJ 


(13C.5) 


where v x are weighted orthogonal m vectors. The elements 
(Dj) of the diagonal matrix D are computed from 

v]dvj = D J 6 1 j, Sy = 0, if i * j, 

= 1, if i = j. (13C.6) 


Backward iteration is used to compute the U and D factors. 
First v n is defined as w n and then from each lower vector 
v i ( =w i). (i = n-1,..., 1), the part that is orthogonal to 
v n is extracted and this process continues individually 
through all the lower vectors v 2 . 

For j = n, n-1,..., 1, initialize vj= Wj, i = n, where the 
superscript i represents (n+l-i)th stage of the 
computational process. For j = n, n-1,..., 2, iterate 
through the following steps. 

Dj = (vj) T Dvj, 


U kJ = (vi>) T Dvj/Dj, 

1 

II 

and carry over 


Vk _1 = vj[ - U k jVj, 
Finally, Di = (vj) t Dv.[. 

k = 1,..., j-1. 

13C.4 Implementation 



The FORTRAN implementation in matrix formulation for the 
general case (13C.1) is given in Bierman (1977, p.131). The 
vector implementation is presented here for the case when A 
is in the observable canonical form and g = [1 0... 0]. 

Given U* and D* factors of P*: P* = U*D*U* , the 
problem is to compute the updated U and D factors in 

P = A T P*A + gqg T = [g A T U*] diagiq D*> [g A T U*] T , 

= WDW T = UDU r 


The implementation has two main parts: 

(i) formation of W as A U* and D = diag (q D*>, 

(ii) extraction of U and D factors using mWGS algorithm. 
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As discussed in Sec. 13.7.2 and shown in (13.6.40M13.6.41), 
W has a strictly zero lower triangular part with unity upper 
diagonal elements. The upper triangular elements of W are 
stored in the vector w; f or example f or_ a 3x3 P matrix, W 
and D elements are stored in w and d vectors typically as 
follows: 

’l w(l) w(2) w(3)" 

W = 0 1 w(3) w(5) , d = [q d* d| dj]. (13C.7) 

0 0 1 w(6)J 

For an nxn P, the necessary size of w is n(n+l)/2. 

13C.5 FORTRAN Mechanization 

Inputs A() unsigned first column of A, for example 

in (13.6.40): A() = (a x a 2 a 3 ) 

N size of the covariance matrix 

U(),D() U* and D* factors of P* to be time updated, 
stored as vectors 

q as stated in the update equation (13C.2), 

default=l. 

Outputs U(),D() time updated U-D factors of P. 

Comment. Upper triangular elements of W are stored in WO; 
INDEXO is a vector of integer values needed .to find the 
index for the elements of W as in (13C.7). 

N1 = N+l 

N2 = N+2 

NTOTAL = (N*Nl)/2 

INDEXO) = 0 

DO 3 I - 2, N 

3 INDEX! I) - N1 - I + INDEX (1-1) 

Comment: Now store the upper triangular part of W in WO 
and Diag. (q,D> in DO. 


W(l) 

s 

-Ml) 

D(N1) 

= 

D(N) 

KU 

= 

0 

DO 4 J 

= 

2,N 

SI 

= 

0.0 

DO 45 I 

as 

1, J-l 

KU 

as 

KU + 1 

W(J+KU) 

as 

U(KU) 

SI 

— 

SI-U(KU)*A(I) 

W(KU+1) 

= 

SI - A(J) 

D(N2-J) 

SB 

D(N1-J) 

CONTINUE 
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D(l) = q 

Comment: Up to (13C.3) is completed here. Now compute the 
updated U and D factors and store in UO and DO. 


1000 DO 6 ITER 
J 

NJ 

SI 

DO 62 K 

IW 

V(K) 

A(K) 

62 SI 


= 1, N 

= N-ITER + 1 
= N1 - J 
= D(J) 

= 1, NJ 

= NTOTAL - INDEX! K) 
= W(IW) 

= D(N2 - K) * V(K) 

- SI + V(K) * A(K) 


Comment . K indexing starts from the rightmost side of W. 
A(NJ+1) = D(J) 

V(NJ+1) = 1.0 

DNEW(J) = SI 

IF(J.EQ.l) GO TO 6 
IF(SI.LT.1E-15) GO TO 6 
DIV = 1.0 / SI 

NJ - NJ + 1 

JM = J - 1 

DO 64 K = 1, JM 

SI = 0.0 

MLAST = NTOTAL - K 
DO 645 I *s 1, NJ 
IW = MLAST - INDEX! I) 

645 SI = SI + W(IW)*A(I) 

SI = SI*DIV 

U(IW) = SI 

DO 64 I = 1, NJ 
IW = MLAST - INDEX! I) 

64 W(IW) = W(IW) - SI*V(I) 

6 NTOTAL = NTOTAL - 1 

DO 8 I = 1, N 

D(I) = DNEW(I) 

8 CONTINUE 
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LOW-PASS FILTER 

A low-pass filter is a frequency domain filter, which allows 
the specified low-frequency part of the signal or data 
sequence to pass through, whereas the higher frequency 
components are attenuated. The objective is to separate from 
the data undesirable high frequency components, which may be 
due to external disturbances or noise. The filter may be 
characterized by the pass-band, the transition band, the 
stop-band and the gain or the pass-band magnitude (see 
Fig.l4A.l). The smaller the transition region, the sharper 
is the separation between the frequency components passed 
and those attenuated. 


Amplitude 



Figure 14A.1 Typical frequency response of a 
low-pass filter. 


If T is the sampling period, that is the time-interval 
at which data are received, the maximum frequency component 
in the data = « S /T = 2n/2T, u B being the sampling frequency. 
The cut-off frequency, w c , has to be lower than w s /2. For 
example, for a yearly periodic process with monthly data, T 
= 1/12; the highest frequency component will be the 

bimonthly periodic component. So if frequency components are 
of interest, the low-pass filter may be used with a pass- 
band of 0 - w s /4. 

There are two basic types of digital (or discrete-time) 
filters: nonrecursive and recursive. A nonrecursive filter 
generates the output from the inputs or time-delayed inputs. 
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y(k) 


(a) 


4 H(q) 


H(q _1 ) = 


(1 - <x) 
1 - «q 


Filter Schematic 




cx = 0<«<1 



Figure 14A.2 A first order low-pass filter. 

For example, 

y f (k) = b t y(k) + b 2 y(k-l) + ... + b n y(k-n). (14A.1) 

A recursive filter generates .the output from the present and 
the time-delayed inputs as well as the time-delayed outputs. 
For example, 

y r (k) = b t y(k) + b 2 y(k-l) + ....+ b n y(k-n) 

- a 1 y f (k-l) - a 2 y f (k-2) -...- any f (k-n). (14A.2) 

A nonrecursive filter is also called finite impulse response 
(FIR) filter, whereas the recursive filter is also referred 
to as infinite impulse response (HR) filter. The impulse 
response for a recursive filter takes a long time to die 
because of the AR part of the filter equation in (14A.2). 

Remark 

A moving average system is a FIR system, whereas a system 
with autoregressive part (e.g. AR, ARMA systems) is an HR 
system. 

□□ 
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Figure 14A.3 The first-order low-pass filter characte- 
ristic for different values of «. 

A first order recursive low-pass filter, H(q) (Fig. 
14A.2) can be expressed as 

y f (k) * H(q)y(k) 

= y(k) = y(k), 0<oc<l, 

1 ./vn ^ 


where q is the unit forward shift operator qy(k) - y(k+l). 

The roots of the denominator and of the numerator of 
H(q) are called poles and zeros respectively. The 
filter-characteristic with respect to different values of a 
is shown in Fig.l4A.3. 

As shown in the figure, the response will be unstable 
for a > 1. For a < 1, lower values of a will lead to the 
pole being further away on the real axis from +1 point of 
the unit circle in the z-plane (see Fig.l4A.2); such a pole 
is known as a fast pole, because it leads to faster 
response. The closer the pole is to +1 point, the slower it 
will be, resulting in a relatively sluggish response. 

The performance of the low-pass filter is judged by the 
ripples in the pass-band and the sharpness of cut-off in the 
frequency response of the filter. Two standard types of low- 
pass filters are (a) the Butterworth filter and (b) the 
Chebyshev filter. The Butterworth filter produces a 
maximally flat response in the pass-band but a relatively 
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large transition band. The Chebyshev filter has a smaller 
transition band than the same order Butterworth filter (i.e. 
it has sharper cutoff) but it has ripples either in 
stop-band or pass-band. 

The design procedures of the low-pass filters are 
detailed in the following references. 
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PERMEABILITY DATA 


The following sets of data are the 2-minutely recorded 
permeability measurements of the green-mix permeability in 
the process of iron-ore sintering collected from an iron and 
steel plant. 

Here the data for ten consecutive hours are presented 
column wise; each column contains data (expressed in 
permeability index or P.I.) for one hour. 


Table 14B.1 Green-mix permeability (P.I.) 
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Appendix 14C 


COMPOSITE DATA ON MATERNAL ECG CONTAINING FETAL ECG 

This set of data were recorded from the abdominal lead of 
an expectant mother during the 37th week of the gestation 
period. These data were recorded with an amplifier gain of 
10,000 and 3dB bandwidth of 0.05 to 250 Hz. The data were 
digitized at a sampling rate of 500 Hz. The data presented 
here are obtained by downsampling the digitized data by a 
factor of 4. 

The data are presented serially columnwise, separately 
on each page. 

Table 14C.1 Data on composite maternal ECG 
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508 Appendix 14C Composite Data on Maternal ECG 


Table 14C.1 Data on composite maternal ECG (contd.) 
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